Data Info¶
########################################
The ExtraSensory Dataset
Primary data files (features and labels)
########################################
The ExtraSensory Dataset was collected by Yonatan Vaizman and Katherine Ellis, with the supervision of Gert Lanckriet. Department of Electrical and Computer Engineering, University of California, San Diego.
The dataset is publicly available. Any usage of the dataset for publications requires citing the official paper that introduced the dataset: Vaizman, Y., Ellis, K., and Lanckriet, G. "Recognizing Detailed Human Context In-the-Wild from Smartphones and Smartwatches". IEEE Pervasive Computing, vol. 16, no. 4, October-December 2017, pp. 62-74. doi:10.1109/MPRV.2017.3971131 (In the website, we refer to this original paper as Vaizman2017a)
########################################
Content of the primary data files: There are 60 'csv.gz' files, one for each participant (user, subject) in the data collection. Each of these files has filename with the form: [UUID].features_labels.csv.gz where each user has a unique (randomly generated) universal user identification (UUID) number. Each file is a textual CSV file, compressed using the gzip format.
Within every user's CSV file:¶
- The first row specifies the columns of the file.
- Every other row refers to an example from the user. The examples are sorted according to the primary key - the timestamp.
- The columns:
-- First column is 'timestamp'. This is represented as standard number of seconds since the epoch.
-- Second, come columns for the extracted features.
Unavailable features are represented with 'nan'.
The name of each feature contains reference to the sensor it was extracted from, in the form [sensor_name]:[feature_name].
The current version contains features from the following sensors, with sensor names:
--- raw_acc: Accelerometer from the phone. The 'raw' version of acceleration (as opposed to the decomposed versions of gravity and user-acceleration).
--- proc_gyro: Gyroscope from the phone. Processed version of gyroscope measurements (the OS calculates a version that removes drift).
--- raw_magnet: Magnetometer from the phone. Raw version (as opposed to bias-fixed version that the OS also provides).
--- watch_acceleration: Accelerometer from the watch.
--- watch_heading: Heading from the compass on the watch.
--- location: Location services. These features were extracted offline for every example from the sequence of latitude-longitude-altitude updates from the example's minute.
These features regard only to relative-location (not absolute location in the world) - meaning, they describe variability of movement within the minute.
--- location_quick_features: Location services. These features were calculated on the phone when data was collected.
These are available even in cases that the other location features are not because the user wanted to conceal their absolute location coordinates.
These quick features are very simple heuristics that approximate the more thoughtful offline features.
--- audio_naive: Microphone. These naive features are simply averages and standard deviations of the 13 MFCCs from the ~20sec recording window of every example.
--- discrete: Phone-state. These are binary indicators for the state of the phone.
Notice that time_of_day features are also considered phone-state features (also have prefix 'discrete:'), but their columns appear not right after the other 'discrete' columns.
--- lf_measurements: Various sensors that were recorded in low-frequency (meaning, once per example).
-- Third, come columns for the ground truth labels. The values are either 1 (label is relevant for the example), 0 (label is not relevant for the example), or 'nan' (label is considered 'missing' for this example). Originally, users could only report 'positive' labels (in the original ExtraSensory paper, Vaizman2017a, we assumed that when a label was not reported it is a 'negative' example). This cleaned version of the labels has the notion of 'missing labels'; Details about how we inferred missing label information is provided in the second paper, Vaizman2017b (see http://extrasensory.ucsd.edu for updated references). The names of the labels have prefix 'label:'. After the prefix: If the label name is all capitalized, it is an original label from the mobile app's interface and the values were taken from what the user originally reported. If the label name begins with 'FIX_', this is a fixes/cleaned version of a corresponding label, meaning that the researchers fixed some of the values that were reported by users because of inconsistencies. If the label name begins with 'OR_', this is a synthesized label, meaning it did not appear in the app's label menu, but rather the researchers created it as combination (using logical or) of other related labels. If the label name begins with 'LOC_', this is a fixed/cleaned version of a corresponding label that was fixed by researchers based on absolute location. LOC_beach was based on original label 'AT_THE_BEACH'. LOC_home was based on original label 'AT_HOME'. LOC_main_workplace was based on original label 'AT_WORK'.
-- Fourth, the last column is label_source, describing where the original labeling came from in the mobile app's interface. It has 8 possible values: -1: The user did not report any labels for this example (notice, however, that this example may still have labeling for the 'LOC_' labels). 0 : The user used the 'active feedback' interface (reporting immediate future). This example is the first in relevant minute sequence. 1 : The user used the 'active feedback' interface. This example is a continuation of a sequence of minutes since the user started the reported context. 2 : The user used the history interface to label an example from the past. 3 : The user replied to a notification that simply asked to provide any labels. 4 : The user replied to a notification that asked 'In the past [minutes] minutes were you still [recent context]?'. The user replied 'correct' on the phone. 5 : The user replied to a notification that asked 'In the past [minutes] minutes were you still [recent context]?'. The user replied 'not exactly' and then corrected the context labels. 6 : The user replied to a notification that asked 'In the past [minutes] minutes were you still [recent context]?'. The user replied 'correct' on the watch interface.
########################################
Data Consolidation¶
Getting the data from all the different files into one csv.
Game plan from here: new csv file for generated labels and real labels model that I am working on (using lstm model to predict next USER DATA & next activity prediction and create a row for it) concat into singular CSV (don't have to rename files beforehand)
Two csv files ?
Possibly use labels to predict next user data
# Standard library imports
import gzip
import os
import shutil
import zipfile
import pandas as pd
import numpy as np
import pickle
import seaborn as sns
import tensorflow as tf
import matplotlib.pyplot as plt
# Related third-party imports
from IPython.display import Markdown, display
from keras.layers import Dense, Dropout, LSTM
from keras.models import Sequential
from keras.preprocessing.sequence import TimeseriesGenerator
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from skmultilearn.problem_transform import ClassifierChain
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score
from tensorflow.keras.preprocessing.sequence import pad_sequences
from sklearn.preprocessing import StandardScaler
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
# Making sure ExtraSensory.per_uuid_features_labels.zip exists and is unziped
def unzip(zip_file):
# Extract to the directory obtained from the zip file name
zip_extract_to = zip_file.replace('.zip', '')
# Unzipping
if os.path.exists(zip_file):
if not os.path.exists(zip_extract_to):
os.makedirs(zip_extract_to)
with zipfile.ZipFile(zip_file, 'r') as zip_ref:
zip_ref.extractall(zip_extract_to)
message = "Unzipped successfully."
else:
message = "Directory already exists. File might be unzipped."
else:
message = "Zip file not found."
print(message)
return zip_extract_to
def csv_extract(zip_extract_to):
# Improved variable name for the directory where the extracted files will be saved
unzipped_data_dir = f"{zip_extract_to}-Unzipped"
# Create the unzipped data directory if it does not exist
if not os.path.exists(unzipped_data_dir):
os.makedirs(unzipped_data_dir)
# Extracting .csv.gz files
extraction_message = ""
if os.path.exists(zip_extract_to):
for file in os.listdir(zip_extract_to):
if file.endswith('.gz'):
gz_file_path = os.path.join(zip_extract_to, file)
csv_file_path = os.path.join(unzipped_data_dir, file[:-3]) # Removing '.gz' from filename
try:
with gzip.open(gz_file_path, 'rb') as f_in:
with open(csv_file_path, 'wb') as f_out:
shutil.copyfileobj(f_in, f_out)
extraction_message += f"Extracted {file}\n"
except Exception as e:
extraction_message += f"Error extracting {file}: {e}\n"
else:
extraction_message = "Directory with .gz files not found."
print(extraction_message.strip())
return unzipped_data_dir
# Function to extract user_id from filename
def extract_user_id(filename):
return filename.split('.')[0]
def make_one_csv(unzipped_data_dir, COMBINED_FILE):
# Combining all CSVs into one dataframe
combined_csv_data = pd.DataFrame()
if os.path.exists(unzipped_data_dir):
for file in os.listdir(unzipped_data_dir):
if file.endswith('.csv'):
file_path = os.path.join(unzipped_data_dir, file)
user_id = extract_user_id(file)
# Read the CSV file and add the user_id column
csv_data = pd.read_csv(file_path)
csv_data['user_id'] = user_id
# Append to the combined dataframe
combined_csv_data = pd.concat([combined_csv_data, csv_data], ignore_index=True)
#print(f"Processed file: {file} \nCurrent size of combined data: {combined_csv_data.shape}")
# Check if any data has been combined
if not combined_csv_data.empty:
# Save the combined CSV data to a file
combined_csv_data.to_csv(COMBINED_FILE, index=False)
print(f"Combined CSV file created at {COMBINED_FILE}.")
else:
print("No CSV files found to combine or combined data is empty.")
else:
print("Directory with unzipped CSV files not found.")
return COMBINED_FILE
COMBINED_FILE = 'ExtraSensory_Combined_User_Data.csv'
if not os.path.exists(COMBINED_FILE):
# Path of the zip file
zip_file = 'ExtraSensory.per_uuid_features_labels.zip'
zip_extract_to = unzip(zip_file)
unzipped_data_dir = csv_extract(zip_extract_to)
make_one_csv(unzipped_data_dir, COMBINED_FILE)
else:
print('Combined file already exists.')
Combined file already exists.
Data Exploration¶
combined_csv_data = pd.read_csv(COMBINED_FILE)
combined_csv_data['timestamp'] = pd.to_datetime(combined_csv_data['timestamp'], unit='s')
print(combined_csv_data.columns)
# user_id is for us to make sure we have record on source of the data.
Index(['timestamp', 'raw_acc:magnitude_stats:mean',
'raw_acc:magnitude_stats:std', 'raw_acc:magnitude_stats:moment3',
'raw_acc:magnitude_stats:moment4',
'raw_acc:magnitude_stats:percentile25',
'raw_acc:magnitude_stats:percentile50',
'raw_acc:magnitude_stats:percentile75',
'raw_acc:magnitude_stats:value_entropy',
'raw_acc:magnitude_stats:time_entropy',
...
'label:ELEVATOR', 'label:OR_standing', 'label:AT_SCHOOL',
'label:PHONE_IN_HAND', 'label:PHONE_IN_BAG', 'label:PHONE_ON_TABLE',
'label:WITH_CO-WORKERS', 'label:WITH_FRIENDS', 'label_source',
'user_id'],
dtype='object', length=279)
# Trying to understand columns
def build_hierarchy(columns):
# Build a nested dictionary representing the hierarchy of columns.
hierarchy = {}
for col in columns:
parts = col.split(':')
current_level = hierarchy
for part in parts[:-1]:
current_level = current_level.setdefault(part, {})
current_level[parts[-1]] = col
return hierarchy
def format_hierarchy(hierarchy, indent=0):
# Format the hierarchy into a readable string with indentation.
result = ""
for key, value in hierarchy.items():
prefix = " " * indent + "- "
if isinstance(value, dict):
result += f"{prefix}{key}:\n{format_hierarchy(value, indent + 1)}"
else:
result += f"{prefix} {key}\n"
return result
# Building and formatting the hierarchy
hierarchy = build_hierarchy(combined_csv_data.columns)
formatted_hierarchy = format_hierarchy(hierarchy)
print(formatted_hierarchy)
- timestamp
- raw_acc:
- magnitude_stats:
- mean
- std
- moment3
- moment4
- percentile25
- percentile50
- percentile75
- value_entropy
- time_entropy
- magnitude_spectrum:
- log_energy_band0
- log_energy_band1
- log_energy_band2
- log_energy_band3
- log_energy_band4
- spectral_entropy
- magnitude_autocorrelation:
- period
- normalized_ac
- 3d:
- mean_x
- mean_y
- mean_z
- std_x
- std_y
- std_z
- ro_xy
- ro_xz
- ro_yz
- proc_gyro:
- magnitude_stats:
- mean
- std
- moment3
- moment4
- percentile25
- percentile50
- percentile75
- value_entropy
- time_entropy
- magnitude_spectrum:
- log_energy_band0
- log_energy_band1
- log_energy_band2
- log_energy_band3
- log_energy_band4
- spectral_entropy
- magnitude_autocorrelation:
- period
- normalized_ac
- 3d:
- mean_x
- mean_y
- mean_z
- std_x
- std_y
- std_z
- ro_xy
- ro_xz
- ro_yz
- raw_magnet:
- magnitude_stats:
- mean
- std
- moment3
- moment4
- percentile25
- percentile50
- percentile75
- value_entropy
- time_entropy
- magnitude_spectrum:
- log_energy_band0
- log_energy_band1
- log_energy_band2
- log_energy_band3
- log_energy_band4
- spectral_entropy
- magnitude_autocorrelation:
- period
- normalized_ac
- 3d:
- mean_x
- mean_y
- mean_z
- std_x
- std_y
- std_z
- ro_xy
- ro_xz
- ro_yz
- avr_cosine_similarity_lag_range0
- avr_cosine_similarity_lag_range1
- avr_cosine_similarity_lag_range2
- avr_cosine_similarity_lag_range3
- avr_cosine_similarity_lag_range4
- watch_acceleration:
- magnitude_stats:
- mean
- std
- moment3
- moment4
- percentile25
- percentile50
- percentile75
- value_entropy
- time_entropy
- magnitude_spectrum:
- log_energy_band0
- log_energy_band1
- log_energy_band2
- log_energy_band3
- log_energy_band4
- spectral_entropy
- magnitude_autocorrelation:
- period
- normalized_ac
- 3d:
- mean_x
- mean_y
- mean_z
- std_x
- std_y
- std_z
- ro_xy
- ro_xz
- ro_yz
- spectrum:
- x_log_energy_band0
- x_log_energy_band1
- x_log_energy_band2
- x_log_energy_band3
- x_log_energy_band4
- y_log_energy_band0
- y_log_energy_band1
- y_log_energy_band2
- y_log_energy_band3
- y_log_energy_band4
- z_log_energy_band0
- z_log_energy_band1
- z_log_energy_band2
- z_log_energy_band3
- z_log_energy_band4
- relative_directions:
- avr_cosine_similarity_lag_range0
- avr_cosine_similarity_lag_range1
- avr_cosine_similarity_lag_range2
- avr_cosine_similarity_lag_range3
- avr_cosine_similarity_lag_range4
- watch_heading:
- mean_cos
- std_cos
- mom3_cos
- mom4_cos
- mean_sin
- std_sin
- mom3_sin
- mom4_sin
- entropy_8bins
- location:
- num_valid_updates
- log_latitude_range
- log_longitude_range
- min_altitude
- max_altitude
- min_speed
- max_speed
- best_horizontal_accuracy
- best_vertical_accuracy
- diameter
- log_diameter
- location_quick_features:
- std_lat
- std_long
- lat_change
- long_change
- mean_abs_lat_deriv
- mean_abs_long_deriv
- audio_naive:
- mfcc0:
- mean
- std
- mfcc1:
- mean
- std
- mfcc2:
- mean
- std
- mfcc3:
- mean
- std
- mfcc4:
- mean
- std
- mfcc5:
- mean
- std
- mfcc6:
- mean
- std
- mfcc7:
- mean
- std
- mfcc8:
- mean
- std
- mfcc9:
- mean
- std
- mfcc10:
- mean
- std
- mfcc11:
- mean
- std
- mfcc12:
- mean
- std
- audio_properties:
- max_abs_value
- normalization_multiplier
- discrete:
- app_state:
- is_active
- is_inactive
- is_background
- missing
- battery_plugged:
- is_ac
- is_usb
- is_wireless
- missing
- battery_state:
- is_unknown
- is_unplugged
- is_not_charging
- is_discharging
- is_charging
- is_full
- missing
- on_the_phone:
- is_False
- is_True
- missing
- ringer_mode:
- is_normal
- is_silent_no_vibrate
- is_silent_with_vibrate
- missing
- wifi_status:
- is_not_reachable
- is_reachable_via_wifi
- is_reachable_via_wwan
- missing
- time_of_day:
- between0and6
- between3and9
- between6and12
- between9and15
- between12and18
- between15and21
- between18and24
- between21and3
- lf_measurements:
- light
- pressure
- proximity_cm
- proximity
- relative_humidity
- battery_level
- screen_brightness
- temperature_ambient
- label:
- LYING_DOWN
- SITTING
- FIX_walking
- FIX_running
- BICYCLING
- SLEEPING
- LAB_WORK
- IN_CLASS
- IN_A_MEETING
- LOC_main_workplace
- OR_indoors
- OR_outside
- IN_A_CAR
- ON_A_BUS
- DRIVE_-_I_M_THE_DRIVER
- DRIVE_-_I_M_A_PASSENGER
- LOC_home
- FIX_restaurant
- PHONE_IN_POCKET
- OR_exercise
- COOKING
- SHOPPING
- STROLLING
- DRINKING__ALCOHOL_
- BATHING_-_SHOWER
- CLEANING
- DOING_LAUNDRY
- WASHING_DISHES
- WATCHING_TV
- SURFING_THE_INTERNET
- AT_A_PARTY
- AT_A_BAR
- LOC_beach
- SINGING
- TALKING
- COMPUTER_WORK
- EATING
- TOILET
- GROOMING
- DRESSING
- AT_THE_GYM
- STAIRS_-_GOING_UP
- STAIRS_-_GOING_DOWN
- ELEVATOR
- OR_standing
- AT_SCHOOL
- PHONE_IN_HAND
- PHONE_IN_BAG
- PHONE_ON_TABLE
- WITH_CO-WORKERS
- WITH_FRIENDS
- label_source
- user_id
# List of label columns to check
label_columns = [col for col in combined_csv_data.columns if col.startswith("label:")]
# Assumption of negatives for ground truths
combined_csv_data[label_columns] = combined_csv_data[label_columns].fillna(0)
combined_csv_data['label_sum_inital'] = combined_csv_data[label_columns].sum(axis=1)
combined_csv_data['label:UNKNOWN'] = (combined_csv_data['label_sum_inital'] == 0).astype(float)
label_columns.append('label:UNKNOWN')
combined_csv_data = combined_csv_data.drop('label_sum_inital', axis=1)
df = combined_csv_data.copy()
# Function to find the label name with value 1
def find_label_name(row):
for col in label_columns:
if row[col] == 1:
return col.split("label:")[1]
return None
# Checking ground truth labels for value counts
column_sums = df[label_columns].sum()
column_sums_sorted = column_sums.sort_values(ascending=True)
# Plot setup
plt.figure(figsize=(10, 14))
column_sums_sorted.plot(kind='barh')
plt.title('Sum of Each Column in DataFrame')
plt.xlabel('Count')
plt.ylabel('Most Done Activities')
plt.tight_layout()
plt.show()
# Deleting the other columns
unneeded_columns = ['user_id', 'label_source']
output_columns = [col for col in combined_csv_data.columns if col.startswith('label:')]
input_columns = [col for col in combined_csv_data.columns if col not in output_columns and col not in unneeded_columns]
X_main = df.copy()[input_columns]
# Checking input variables for missing values
def nan_percentage(df):
nan_percentage = (df.isna().mean() * 100).round(2)
nan_percentage_df = pd.DataFrame({'Variable': nan_percentage.index, 'NaN Percentage': nan_percentage.values})
nan_percentage_df = nan_percentage_df.sort_values(by='NaN Percentage', ascending=True)
# Plot setup
plt.figure(figsize=(10, 35))
plt.barh(nan_percentage_df['Variable'], nan_percentage_df['NaN Percentage'], color='skyblue')
plt.xlabel('Percentage of Missing Values')
plt.ylabel('Variables')
plt.title('Missing Value Percentage by Variable')
plt.grid(axis='x')
plt.tight_layout()
plt.show()
return nan_percentage_df
# Checking overall missing percentage
nan_percentage_df = nan_percentage(X_main)
X_with_users = df.drop(columns=['label_source'])
users = X_with_users['user_id'].unique()
len(users)
60
features = input_columns
# Initialize a list to store the counts for each user
nan_counts_list = []
# Loop through each user
for user in users:
# Filter the DataFrame for the current user
df_user = X_with_users[X_with_users['user_id'] == user]
# Count the NaN values for each feature for the current user and add user_id to the series
nan_count = df_user[features].isna().sum()
nan_count['user_id'] = user # Add user_id to the count
# Append the count series to the list
nan_counts_list.append(nan_count)
# Convert the list of Series to a DataFrame
nan_counts_per_user = pd.DataFrame(nan_counts_list)
# If needed, set the user_id as the index
nan_counts_per_user.set_index('user_id', inplace=True)
# Plot the heat map
plt.figure(figsize=(30, 30))
sns.heatmap(nan_counts_per_user, annot=False, cmap='Reds')
plt.title('Heatmap of Missing Values per User')
plt.xlabel('Features')
plt.ylabel('Users')
plt.show()
# Calculate the total number of rows for each user in the original DataFrame
user_total_length = X_with_users.groupby('user_id').size()
# Convert this to a DataFrame or a Series that can be added to nan_counts_per_user
user_total_length_df = user_total_length.to_frame(name='total_length')
# Merge this information with nan_counts_per_user
# Since nan_counts_per_user already has user_id as its index, we can directly add the new column
nan_counts_per_user['total_length'] = user_total_length_df['total_length']
# Now, nan_counts_per_user includes the total_length column
# Calculate the total NaN count for each feature across all users
total_nan_counts = nan_counts_per_user.sum()
# Assuming `X_with_users` is your original DataFrame and has the same number of entries for each user,
# Calculate the total number of entries for a single feature across all users
total_entries_per_feature = len(X_with_users) # Or, more specifically, len(users) * average_entries_per_user if varies
# Calculate the percentage of missing data for each feature
percentage_missing = (total_nan_counts / total_entries_per_feature) * 100
# Decide on a threshold for removing columns, e.g., 1%
threshold = 0
# Identify columns that exceed this threshold
columns_to_remove = percentage_missing[percentage_missing > threshold].index.tolist()
print("Removing ", len(columns_to_remove),"columns out of ", len(X_with_users.columns))
# Print out the columns to remove
# print("Columns to remove due to excessive missing data:", columns_to_remove)
features_to_include = [feature for feature in features if feature not in columns_to_remove]
if 'timestamp' in X_with_users.columns:
X_with_users['timestamp_numeric'] = X_with_users['timestamp'].astype(np.int64) // 10**9
# Ensure 'timestamp_numeric' is included and 'timestamp' is excluded from features_to_include
features_to_include = [f for f in features_to_include if f != 'timestamp'] + ['timestamp_numeric']
# Continue with your existing preprocessing...
user_df = X_with_users[X_with_users['user_id'] == users[-1]]
median_values = user_df[features_to_include].median()
user_df = user_df[features_to_include].fillna(median_values)
Removing 192 columns out of 279
Testing to see if we need to do Binary Relevance, or Classifier chains. We can also check Label Powerset, but the issue is combinations can become very large since we have 52 labels.¶
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import pandas as pd
import numpy as np
combined_csv_data_4_model = combined_csv_data.copy()
combined_csv_data_4_model['timestamp_numeric'] = combined_csv_data_4_model['timestamp'].astype(np.int64) // 10**9
combined_csv_data_4_model = combined_csv_data_4_model.drop(columns=['timestamp'])
X = combined_csv_data_4_model[features_to_include]
y = combined_csv_data_4_model[output_columns]
# Normalize features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)
plt.figure(figsize = (20,20))
corr_matrix = y.corr(method = 'pearson') # Compute the correlation matrix
# Flatten the matrix, sort by absolute value while preserving names
corr_flat = corr_matrix.unstack()
corr_flat_sorted = corr_flat.abs().sort_values(ascending=False)
# Remove self-correlations
corr_flat_sorted = corr_flat_sorted[corr_flat_sorted < 1]
# Take the top N correlations for plotting (for simplicity, let's plot all unique pairs)
unique_pairs = corr_flat_sorted.drop_duplicates().head(10)
# Plotting
plt.figure(figsize=(10, 6))
unique_pairs.plot(kind='bar')
plt.title('Top Correlations Between Labels')
plt.xlabel('Label Pairs')
plt.ylabel('Correlation')
plt.xticks(rotation=45, ha='right')
plt.show()
<Figure size 2000x2000 with 0 Axes>
Seems like there is a corrolation, lets do Classifier chains.
plt.figure(figsize = (20,20))
corr = y.corr(method = 'pearson')
corr_flat = corr.unstack().sort_values(ascending =False)
sns.heatmap(corr, annot = False, cmap = 'coolwarm')
plt.show()
combined_csv_data.shape
(377346, 280)
combined_csv_data_4_model = combined_csv_data.iloc[:37734,:].copy()
combined_csv_data_4_model['timestamp_numeric'] = combined_csv_data_4_model['timestamp'].astype(np.int64) // 10**9
combined_csv_data_4_model = combined_csv_data_4_model.drop(columns=['timestamp'])
X = combined_csv_data_4_model[features_to_include]
y = combined_csv_data_4_model[output_columns]
# Normalize features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)
# Initialize Classifier Chain with a RandomForest base classifier
classifier = ClassifierChain(RandomForestClassifier())
# Train the Classifier Chain model
classifier.fit(X_train, y_train)
# Make predictions
predictions = classifier.predict(X_test)
# Note: accuracy_score expects single-label predictions,
# so for multi-label you might use another metric like hamming loss or a subset accuracy function
# Here's an example with a custom subset accuracy for multi-label
def subset_accuracy(y_true, y_pred):
return (y_true == y_pred).all(axis=1).mean()
print("Subset Accuracy: ", subset_accuracy(y_test, predictions.toarray()))
Subset Accuracy: 0.956804028090632
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import pandas as pd
import numpy as np
import pickle
user = users[0]
models_data = {
'models': {},
'accuracies': {}
}
combined_csv_data_4_model = combined_csv_data.copy()
combined_csv_data_4_model['timestamp_numeric'] = combined_csv_data_4_model['timestamp'].astype(np.int64) // 10**9
combined_csv_data_4_model = combined_csv_data_4_model.drop(columns=['timestamp'])
import warnings
warnings.simplefilter(action='ignore', category=RuntimeWarning)
counter = 1
for user in users:
user_df = combined_csv_data_4_model[combined_csv_data['user_id'] == user]
print(f'Shape of df of user no. {counter} of id {user} is: {user_df.shape}')
# Assuming 'combined_csv_data' is your DataFrame
X = user_df[features_to_include]
y = user_df[output_columns]
# Normalize features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)
# Initialize Classifier Chain with a RandomForest base classifier
classifier = ClassifierChain(RandomForestClassifier())
# Train the Classifier Chain model
classifier.fit(X_train, y_train)
models_data['models'][user] = classifier
# Make predictions
predictions = classifier.predict(X_test)
# Evaluate your model
# Example: Using accuracy score, you can choose other metrics as appropriate
from sklearn.metrics import accuracy_score
# Note: accuracy_score expects single-label predictions,
# so for multi-label you might use another metric like hamming loss or a subset accuracy function
# Here's an example with a custom subset accuracy for multi-label
def subset_accuracy(y_true, y_pred):
return (y_true == y_pred).all(axis=1).mean()
accuracy = subset_accuracy(y_test, predictions.toarray())
models_data['accuracies'][user] = accuracy
print(f"Subset Accuracy for {user}: ", accuracy)
with open('clfs_2.pkl', 'wb') as file:
pickle.dump(models_data, file)
print('File Updated')
counter = counter +1
Shape of df of user no. 1 of id 81536B0A-8DBF-4D8A-AC24-9543E2E4C8E0 is: (6407, 280) Subset Accuracy for 81536B0A-8DBF-4D8A-AC24-9543E2E4C8E0: 0.9446177847113885 File Updated Shape of df of user no. 2 of id 61359772-D8D8-480D-B623-7C636EAD0C81 is: (6079, 280) Subset Accuracy for 61359772-D8D8-480D-B623-7C636EAD0C81: 0.9769736842105263 File Updated Shape of df of user no. 3 of id 40E170A7-607B-4578-AF04-F021C3B0384A is: (7649, 280) Subset Accuracy for 40E170A7-607B-4578-AF04-F021C3B0384A: 0.949673202614379 File Updated Shape of df of user no. 4 of id 806289BC-AD52-4CC1-806C-0CDB14D65EB6 is: (9242, 280) Subset Accuracy for 806289BC-AD52-4CC1-806C-0CDB14D65EB6: 0.9502433747971877 File Updated Shape of df of user no. 5 of id 61976C24-1C50-4355-9C49-AAE44A7D09F6 is: (8730, 280) Subset Accuracy for 61976C24-1C50-4355-9C49-AAE44A7D09F6: 0.9616265750286369 File Updated Shape of df of user no. 6 of id D7D20E2E-FC78-405D-B346-DBD3FD8FC92B is: (6210, 280) Subset Accuracy for D7D20E2E-FC78-405D-B346-DBD3FD8FC92B: 0.9082125603864735 File Updated Shape of df of user no. 7 of id 7D9BB102-A612-4E2A-8E22-3159752F55D8 is: (1600, 280) Subset Accuracy for 7D9BB102-A612-4E2A-8E22-3159752F55D8: 0.934375 File Updated Shape of df of user no. 8 of id 5119D0F8-FCA8-4184-A4EB-19421A40DE0D is: (6617, 280) Subset Accuracy for 5119D0F8-FCA8-4184-A4EB-19421A40DE0D: 0.9350453172205438 File Updated Shape of df of user no. 9 of id 9DC38D04-E82E-4F29-AB52-B476535226F2 is: (9686, 280) Subset Accuracy for 9DC38D04-E82E-4F29-AB52-B476535226F2: 0.8668730650154799 File Updated Shape of df of user no. 10 of id A7599A50-24AE-46A6-8EA6-2576F1011D81 is: (3898, 280) Subset Accuracy for A7599A50-24AE-46A6-8EA6-2576F1011D81: 0.9807692307692307 File Updated Shape of df of user no. 11 of id 59EEFAE0-DEB0-4FFF-9250-54D2A03D0CF2 is: (7542, 280) Subset Accuracy for 59EEFAE0-DEB0-4FFF-9250-54D2A03D0CF2: 0.9390324718356527 File Updated Shape of df of user no. 12 of id 24E40C4C-A349-4F9F-93AB-01D00FB994AF is: (4771, 280) Subset Accuracy for 24E40C4C-A349-4F9F-93AB-01D00FB994AF: 0.9099476439790576 File Updated Shape of df of user no. 13 of id 9759096F-1119-4E19-A0AD-6F16989C7E1C is: (9959, 280) Subset Accuracy for 9759096F-1119-4E19-A0AD-6F16989C7E1C: 0.9357429718875502 File Updated Shape of df of user no. 14 of id 1155FF54-63D3-4AB2-9863-8385D0BD0A13 is: (2685, 280) Subset Accuracy for 1155FF54-63D3-4AB2-9863-8385D0BD0A13: 0.8677839851024208 File Updated Shape of df of user no. 15 of id 96A358A0-FFF2-4239-B93E-C7425B901B47 is: (5819, 280) Subset Accuracy for 96A358A0-FFF2-4239-B93E-C7425B901B47: 0.9690721649484536 File Updated Shape of df of user no. 16 of id 78A91A4E-4A51-4065-BDA7-94755F0BB3BB is: (11996, 280) Subset Accuracy for 78A91A4E-4A51-4065-BDA7-94755F0BB3BB: 0.97 File Updated Shape of df of user no. 17 of id F50235E0-DD67-4F2A-B00B-1F31ADA998B9 is: (2266, 280) Subset Accuracy for F50235E0-DD67-4F2A-B00B-1F31ADA998B9: 0.8325991189427313 File Updated Shape of df of user no. 18 of id 1538C99F-BA1E-4EFB-A949-6C7C47701B20 is: (6549, 280) Subset Accuracy for 1538C99F-BA1E-4EFB-A949-6C7C47701B20: 0.9564885496183206 File Updated Shape of df of user no. 19 of id 11B5EC4D-4133-4289-B475-4E737182A406 is: (8845, 280) Subset Accuracy for 11B5EC4D-4133-4289-B475-4E737182A406: 0.9101187111362352 File Updated Shape of df of user no. 20 of id 098A72A5-E3E5-4F54-A152-BBDA0DF7B694 is: (6813, 280) Subset Accuracy for 098A72A5-E3E5-4F54-A152-BBDA0DF7B694: 0.9053558327219369 File Updated Shape of df of user no. 21 of id 59818CD2-24D7-4D32-B133-24C2FE3801E5 is: (5947, 280) Subset Accuracy for 59818CD2-24D7-4D32-B133-24C2FE3801E5: 0.9403361344537815 File Updated Shape of df of user no. 22 of id 33A85C34-CFE4-4732-9E73-0A7AC861B27A is: (6172, 280) Subset Accuracy for 33A85C34-CFE4-4732-9E73-0A7AC861B27A: 0.9465587044534413 File Updated Shape of df of user no. 23 of id 00EABED2-271D-49D8-B599-1D4A09240601 is: (2287, 280) Subset Accuracy for 00EABED2-271D-49D8-B599-1D4A09240601: 0.8013100436681223 File Updated Shape of df of user no. 24 of id 136562B6-95B2-483D-88DC-065F28409FD2 is: (6218, 280) Subset Accuracy for 136562B6-95B2-483D-88DC-065F28409FD2: 0.8593247588424437 File Updated Shape of df of user no. 25 of id B9724848-C7E2-45F4-9B3F-A1F38D864495 is: (7626, 280) Subset Accuracy for B9724848-C7E2-45F4-9B3F-A1F38D864495: 0.9462647444298821 File Updated Shape of df of user no. 26 of id CF722AA9-2533-4E51-9FEB-9EAC84EE9AAC is: (3615, 280) Subset Accuracy for CF722AA9-2533-4E51-9FEB-9EAC84EE9AAC: 0.8616874135546335 File Updated Shape of df of user no. 27 of id FDAA70A1-42A3-4E3F-9AE3-3FDA412E03BF is: (4973, 280) Subset Accuracy for FDAA70A1-42A3-4E3F-9AE3-3FDA412E03BF: 0.9587939698492463 File Updated Shape of df of user no. 28 of id A5CDF89D-02A2-4EC1-89F8-F534FDABDD96 is: (6040, 280) Subset Accuracy for A5CDF89D-02A2-4EC1-89F8-F534FDABDD96: 0.7971854304635762 File Updated Shape of df of user no. 29 of id 0BFC35E2-4817-4865-BFA7-764742302A2D is: (3108, 280) Subset Accuracy for 0BFC35E2-4817-4865-BFA7-764742302A2D: 0.905144694533762 File Updated Shape of df of user no. 30 of id BEF6C611-50DA-4971-A040-87FB979F3FC1 is: (3451, 280) Subset Accuracy for BEF6C611-50DA-4971-A040-87FB979F3FC1: 0.9507959479015919 File Updated Shape of df of user no. 31 of id 4FC32141-E888-4BFF-8804-12559A491D8C is: (4979, 280) Subset Accuracy for 4FC32141-E888-4BFF-8804-12559A491D8C: 0.9257028112449799 File Updated Shape of df of user no. 32 of id A76A5AF5-5A93-4CF2-A16E-62353BB70E8A is: (7520, 280) Subset Accuracy for A76A5AF5-5A93-4CF2-A16E-62353BB70E8A: 0.9394946808510638 File Updated Shape of df of user no. 33 of id 3600D531-0C55-44A7-AE95-A7A38519464E is: (5203, 280) Subset Accuracy for 3600D531-0C55-44A7-AE95-A7A38519464E: 0.9615754082612872 File Updated Shape of df of user no. 34 of id 2C32C23E-E30C-498A-8DD2-0EFB9150A02E is: (8516, 280) Subset Accuracy for 2C32C23E-E30C-498A-8DD2-0EFB9150A02E: 0.9401408450704225 File Updated Shape of df of user no. 35 of id 86A4F379-B305-473D-9D83-FC7D800180EF is: (10738, 280) Subset Accuracy for 86A4F379-B305-473D-9D83-FC7D800180EF: 0.9762569832402235 File Updated Shape of df of user no. 36 of id 99B204C0-DD5C-4BB7-83E8-A37281B8D769 is: (6038, 280) Subset Accuracy for 99B204C0-DD5C-4BB7-83E8-A37281B8D769: 0.9271523178807947 File Updated Shape of df of user no. 37 of id 74B86067-5D4B-43CF-82CF-341B76BEA0F4 is: (7298, 280) Subset Accuracy for 74B86067-5D4B-43CF-82CF-341B76BEA0F4: 0.9541095890410959 File Updated Shape of df of user no. 38 of id 5EF64122-B513-46AE-BCF1-E62AAC285D2C is: (3911, 280) Subset Accuracy for 5EF64122-B513-46AE-BCF1-E62AAC285D2C: 0.9106002554278416 File Updated Shape of df of user no. 39 of id B7F9D634-263E-4A97-87F9-6FFB4DDCB36C is: (9383, 280) Subset Accuracy for B7F9D634-263E-4A97-87F9-6FFB4DDCB36C: 0.9355354288758657 File Updated Shape of df of user no. 40 of id A5A30F76-581E-4757-97A2-957553A2C6AA is: (1667, 280) Subset Accuracy for A5A30F76-581E-4757-97A2-957553A2C6AA: 0.8922155688622755 File Updated Shape of df of user no. 41 of id C48CE857-A0DD-4DDB-BEA5-3A25449B2153 is: (5092, 280) Subset Accuracy for C48CE857-A0DD-4DDB-BEA5-3A25449B2153: 0.9558390578999019 File Updated Shape of df of user no. 42 of id 83CF687B-7CEC-434B-9FE8-00C3D5799BE6 is: (9539, 280) Subset Accuracy for 83CF687B-7CEC-434B-9FE8-00C3D5799BE6: 0.9475890985324947 File Updated Shape of df of user no. 43 of id 0A986513-7828-4D53-AA1F-E02D6DF9561B is: (3960, 280) Subset Accuracy for 0A986513-7828-4D53-AA1F-E02D6DF9561B: 0.9570707070707071 File Updated Shape of df of user no. 44 of id 7CE37510-56D0-4120-A1CF-0E23351428D2 is: (9761, 280) Subset Accuracy for 7CE37510-56D0-4120-A1CF-0E23351428D2: 0.9406041986687148 File Updated Shape of df of user no. 45 of id E65577C1-8D5D-4F70-AF23-B3ADB9D3DBA3 is: (3441, 280) Subset Accuracy for E65577C1-8D5D-4F70-AF23-B3ADB9D3DBA3: 0.8127721335268505 File Updated Shape of df of user no. 46 of id CCAF77F0-FABB-4F2F-9E24-D56AD0C5A82F is: (8472, 280) Subset Accuracy for CCAF77F0-FABB-4F2F-9E24-D56AD0C5A82F: 0.9669616519174041 File Updated Shape of df of user no. 47 of id CA820D43-E5E2-42EF-9798-BE56F776370B is: (7865, 280) Subset Accuracy for CA820D43-E5E2-42EF-9798-BE56F776370B: 0.8951048951048951 File Updated Shape of df of user no. 48 of id 8023FE1A-D3B0-4E2C-A57A-9321B7FC755F is: (9189, 280) Subset Accuracy for 8023FE1A-D3B0-4E2C-A57A-9321B7FC755F: 0.9515778019586507 File Updated Shape of df of user no. 49 of id 481F4DD2-7689-43B9-A2AA-C8772227162B is: (6691, 280) Subset Accuracy for 481F4DD2-7689-43B9-A2AA-C8772227162B: 0.9051530993278566 File Updated Shape of df of user no. 50 of id CDA3BBF7-6631-45E8-85BA-EEB416B32A3C is: (2860, 280) Subset Accuracy for CDA3BBF7-6631-45E8-85BA-EEB416B32A3C: 0.9912587412587412 File Updated Shape of df of user no. 51 of id 4E98F91F-4654-42EF-B908-A3389443F2E7 is: (3250, 280) Subset Accuracy for 4E98F91F-4654-42EF-B908-A3389443F2E7: 0.9661538461538461 File Updated Shape of df of user no. 52 of id ECECC2AB-D32F-4F90-B74C-E12A1C69BBE2 is: (3530, 280) Subset Accuracy for ECECC2AB-D32F-4F90-B74C-E12A1C69BBE2: 0.9631728045325779 File Updated Shape of df of user no. 53 of id B09E373F-8A54-44C8-895B-0039390B859F is: (8134, 280) Subset Accuracy for B09E373F-8A54-44C8-895B-0039390B859F: 0.9157959434542102 File Updated Shape of df of user no. 54 of id BE3CA5A6-A561-4BBD-B7C9-5DF6805400FC is: (8309, 280) Subset Accuracy for BE3CA5A6-A561-4BBD-B7C9-5DF6805400FC: 0.9446450060168472 File Updated Shape of df of user no. 55 of id 797D145F-3858-4A7F-A7C2-A4EB721E133C is: (3593, 280) Subset Accuracy for 797D145F-3858-4A7F-A7C2-A4EB721E133C: 0.8887343532684284 File Updated Shape of df of user no. 56 of id 1DBB0F6F-1F81-4A50-9DF4-CD62ACFA4842 is: (7375, 280) Subset Accuracy for 1DBB0F6F-1F81-4A50-9DF4-CD62ACFA4842: 0.9010169491525424 File Updated Shape of df of user no. 57 of id 665514DE-49DC-421F-8DCB-145D0B2609AD is: (9167, 280) Subset Accuracy for 665514DE-49DC-421F-8DCB-145D0B2609AD: 0.9623773173391494 File Updated Shape of df of user no. 58 of id 5152A2DF-FAF3-4BA8-9CA9-E66B32671A53 is: (6617, 280) Subset Accuracy for 5152A2DF-FAF3-4BA8-9CA9-E66B32671A53: 0.9350453172205438 File Updated Shape of df of user no. 59 of id 0E6184E1-90C0-48EE-B25A-F1ECB7B9714E is: (7521, 280) Subset Accuracy for 0E6184E1-90C0-48EE-B25A-F1ECB7B9714E: 0.9408637873754153 File Updated Shape of df of user no. 60 of id 27E04243-B138-4F40-A164-F40B60165CF3 is: (4927, 280) Subset Accuracy for 27E04243-B138-4F40-A164-F40B60165CF3: 0.9655172413793104 File Updated
print(f'Shape of df is: {combined_csv_data_4_model.shape}')
# Assuming 'combined_csv_data' is your DataFrame
X = combined_csv_data_4_model[features_to_include]
y = combined_csv_data_4_model[output_columns]
# Normalize features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)
# Initialize Classifier Chain with a RandomForest base classifier
classifier = ClassifierChain(RandomForestClassifier())
# Train the Classifier Chain model
classifier.fit(X_train, y_train)
models_data['models']["all"] = classifier
# Make predictions
print('Processing predictions for X_test.')
predictions = classifier.predict(X_test)
def subset_accuracy(y_true, y_pred):
return (y_true == y_pred).all(axis=1).mean()
print('Processing accuracy.')
accuracy = subset_accuracy(y_test, predictions.toarray())
models_data['accuracies']["all"] = accuracy
print(f"Subset Accuracy for all together data is: ", accuracy)
with open('all_clfs_2.pkl', 'wb') as file:
pickle.dump(models_data, file)
print('File Updated')
Shape of df is: (377346, 280) Processing predictions for X_test. Processing accuracy. Subset Accuracy for all together data is: 0.78130382933616 File Updated
Using classifier chains to generate predictions for LSTM¶
Let's use the success of our classifier chains for individual users to generate predictions for our LSTM model. First we'll prepare the predictions generated by the classifier chains.
combined_csv_data_4_model = combined_csv_data.copy()
combined_csv_data_4_model['timestamp_numeric'] = pd.to_datetime(combined_csv_data_4_model['timestamp']).astype(np.int64) // 10**9
combined_csv_data_4_model = combined_csv_data_4_model.drop(columns=['timestamp'])
# Creating user_specific_data dictionary
user_specific_data = {}
for user in users:
user_df = combined_csv_data_4_model[combined_csv_data_4_model['user_id'] == user]
# Sorting user_df by 'timestamp_numeric' to ensure temporal order
user_df = user_df.sort_values(by='timestamp_numeric')
user_specific_data[user] = user_df
# Loading models from disk
with open('clfs_2.pkl', 'rb') as file:
models_data = pickle.load(file)
# Defining function to generate predictions using classifier chains
def generate_classifier_chain_predictions(user_df, classifier_chain_model):
# Normalize features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(user_df[features_to_include])
# Generate predictions
predictions = classifier_chain_model.predict(X_scaled)
# Return the predictions as an array
return predictions.toarray()
user_predictions = {}
for user_id, user_data in user_specific_data.items():
user_df = user_specific_data[user_id]
classifier_chain_model = models_data['models'][user_id] # Get the corresponding model for the user
# Generate classifier chain predictions for the user
user_predictions[user_id] = generate_classifier_chain_predictions(user_df, classifier_chain_model)
# Now 'user_predictions' contains predictions for each user that can be used as input for the LSTM
# Iterating through user_predictions and print the shape of each user's predictions
for user_id, predictions in user_predictions.items():
print(f"User ID: {user_id}, Shape: {np.array(predictions).shape}")
# Checking if all predictions are 2D arrays with a consistent second dimension
consistent_shape = True
second_dim = None
for predictions in user_predictions.values():
np_predictions = np.array(predictions)
if second_dim is None:
second_dim = np_predictions.shape[1] if len(np_predictions.shape) > 1 else 0
elif len(np_predictions.shape) <= 1 or np_predictions.shape[1] != second_dim:
consistent_shape = False
break
if consistent_shape and second_dim:
print(f"All predictions are 2D arrays with a consistent second dimension: {second_dim}")
else:
print("Predictions are not consistent 2D arrays or have varying second dimensions.")
User ID: 81536B0A-8DBF-4D8A-AC24-9543E2E4C8E0, Shape: (6407, 52) User ID: 61359772-D8D8-480D-B623-7C636EAD0C81, Shape: (6079, 52) User ID: 40E170A7-607B-4578-AF04-F021C3B0384A, Shape: (7649, 52) User ID: 806289BC-AD52-4CC1-806C-0CDB14D65EB6, Shape: (9242, 52) User ID: 61976C24-1C50-4355-9C49-AAE44A7D09F6, Shape: (8730, 52) User ID: D7D20E2E-FC78-405D-B346-DBD3FD8FC92B, Shape: (6210, 52) User ID: 7D9BB102-A612-4E2A-8E22-3159752F55D8, Shape: (1600, 52) User ID: 5119D0F8-FCA8-4184-A4EB-19421A40DE0D, Shape: (6617, 52) User ID: 9DC38D04-E82E-4F29-AB52-B476535226F2, Shape: (9686, 52) User ID: A7599A50-24AE-46A6-8EA6-2576F1011D81, Shape: (3898, 52) User ID: 59EEFAE0-DEB0-4FFF-9250-54D2A03D0CF2, Shape: (7542, 52) User ID: 24E40C4C-A349-4F9F-93AB-01D00FB994AF, Shape: (4771, 52) User ID: 9759096F-1119-4E19-A0AD-6F16989C7E1C, Shape: (9959, 52) User ID: 1155FF54-63D3-4AB2-9863-8385D0BD0A13, Shape: (2685, 52) User ID: 96A358A0-FFF2-4239-B93E-C7425B901B47, Shape: (5819, 52) User ID: 78A91A4E-4A51-4065-BDA7-94755F0BB3BB, Shape: (11996, 52) User ID: F50235E0-DD67-4F2A-B00B-1F31ADA998B9, Shape: (2266, 52) User ID: 1538C99F-BA1E-4EFB-A949-6C7C47701B20, Shape: (6549, 52) User ID: 11B5EC4D-4133-4289-B475-4E737182A406, Shape: (8845, 52) User ID: 098A72A5-E3E5-4F54-A152-BBDA0DF7B694, Shape: (6813, 52) User ID: 59818CD2-24D7-4D32-B133-24C2FE3801E5, Shape: (5947, 52) User ID: 33A85C34-CFE4-4732-9E73-0A7AC861B27A, Shape: (6172, 52) User ID: 00EABED2-271D-49D8-B599-1D4A09240601, Shape: (2287, 52) User ID: 136562B6-95B2-483D-88DC-065F28409FD2, Shape: (6218, 52) User ID: B9724848-C7E2-45F4-9B3F-A1F38D864495, Shape: (7626, 52) User ID: CF722AA9-2533-4E51-9FEB-9EAC84EE9AAC, Shape: (3615, 52) User ID: FDAA70A1-42A3-4E3F-9AE3-3FDA412E03BF, Shape: (4973, 52) User ID: A5CDF89D-02A2-4EC1-89F8-F534FDABDD96, Shape: (6040, 52) User ID: 0BFC35E2-4817-4865-BFA7-764742302A2D, Shape: (3108, 52) User ID: BEF6C611-50DA-4971-A040-87FB979F3FC1, Shape: (3451, 52) User ID: 4FC32141-E888-4BFF-8804-12559A491D8C, Shape: (4979, 52) User ID: A76A5AF5-5A93-4CF2-A16E-62353BB70E8A, Shape: (7520, 52) User ID: 3600D531-0C55-44A7-AE95-A7A38519464E, Shape: (5203, 52) User ID: 2C32C23E-E30C-498A-8DD2-0EFB9150A02E, Shape: (8516, 52) User ID: 86A4F379-B305-473D-9D83-FC7D800180EF, Shape: (10738, 52) User ID: 99B204C0-DD5C-4BB7-83E8-A37281B8D769, Shape: (6038, 52) User ID: 74B86067-5D4B-43CF-82CF-341B76BEA0F4, Shape: (7298, 52) User ID: 5EF64122-B513-46AE-BCF1-E62AAC285D2C, Shape: (3911, 52) User ID: B7F9D634-263E-4A97-87F9-6FFB4DDCB36C, Shape: (9383, 52) User ID: A5A30F76-581E-4757-97A2-957553A2C6AA, Shape: (1667, 52) User ID: C48CE857-A0DD-4DDB-BEA5-3A25449B2153, Shape: (5092, 52) User ID: 83CF687B-7CEC-434B-9FE8-00C3D5799BE6, Shape: (9539, 52) User ID: 0A986513-7828-4D53-AA1F-E02D6DF9561B, Shape: (3960, 52) User ID: 7CE37510-56D0-4120-A1CF-0E23351428D2, Shape: (9761, 52) User ID: E65577C1-8D5D-4F70-AF23-B3ADB9D3DBA3, Shape: (3441, 52) User ID: CCAF77F0-FABB-4F2F-9E24-D56AD0C5A82F, Shape: (8472, 52) User ID: CA820D43-E5E2-42EF-9798-BE56F776370B, Shape: (7865, 52) User ID: 8023FE1A-D3B0-4E2C-A57A-9321B7FC755F, Shape: (9189, 52) User ID: 481F4DD2-7689-43B9-A2AA-C8772227162B, Shape: (6691, 52) User ID: CDA3BBF7-6631-45E8-85BA-EEB416B32A3C, Shape: (2860, 52) User ID: 4E98F91F-4654-42EF-B908-A3389443F2E7, Shape: (3250, 52) User ID: ECECC2AB-D32F-4F90-B74C-E12A1C69BBE2, Shape: (3530, 52) User ID: B09E373F-8A54-44C8-895B-0039390B859F, Shape: (8134, 52) User ID: BE3CA5A6-A561-4BBD-B7C9-5DF6805400FC, Shape: (8309, 52) User ID: 797D145F-3858-4A7F-A7C2-A4EB721E133C, Shape: (3593, 52) User ID: 1DBB0F6F-1F81-4A50-9DF4-CD62ACFA4842, Shape: (7375, 52) User ID: 665514DE-49DC-421F-8DCB-145D0B2609AD, Shape: (9167, 52) User ID: 5152A2DF-FAF3-4BA8-9CA9-E66B32671A53, Shape: (6617, 52) User ID: 0E6184E1-90C0-48EE-B25A-F1ECB7B9714E, Shape: (7521, 52) User ID: 27E04243-B138-4F40-A164-F40B60165CF3, Shape: (4927, 52) All predictions are 2D arrays with a consistent second dimension: 52
# Determine the length of the longest sequence
max_sequence_length = max([len(predictions) for predictions in user_predictions.values()])
# Pad sequences to have the same length and stack them
X_lstm = pad_sequences(list(user_predictions.values()), maxlen=max_sequence_length, padding='post', dtype='float64')
# Since we are using padding, we need to keep track of the original lengths of each user's predictions
# This will be useful when interpreting the model's predictions
original_lengths = [len(predictions) for predictions in user_predictions.values()]
# Reshape the data to add a feature dimension (required by Conv1D layers, if used)
X_lstm = X_lstm.reshape((X_lstm.shape[0], X_lstm.shape[1], 52))
print(f"LSTM input shape: {X_lstm.shape}")
LSTM input shape: (60, 11996, 52)
# Creating user_specific_timestamps dictionary
user_specific_timestamps = {}
for user in users:
# Retrieve the user's dataframe including the timestamp
user_df = combined_csv_data[combined_csv_data['user_id'] == user]
# Sort user_df by 'timestamp' to ensure temporal order
user_df = user_df.sort_values(by='timestamp')
# Extract and store the timestamps for the user
user_specific_timestamps[user] = user_df['timestamp'].values
# Now 'user_specific_timestamps' contains the ordered timestamps for each user
# Pad the timestamps to have the same length as the sequences
padded_timestamps = pad_sequences(list(user_specific_timestamps.values()), maxlen=max_sequence_length, padding='post', value=None, dtype='float64') # Use "NONE" as a placeholder for non-real timestamps
# Keep track of the original lengths to filter out the padded timestamps later
original_timestamp_lengths = [len(ts) for ts in user_specific_timestamps.values()]
Now let's prepare the LSTM model
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout
from tensorflow.keras.optimizers import Adam
model = Sequential([
LSTM(50, input_shape=(X_lstm.shape[1], X_lstm.shape[2]), return_sequences=True),
Dropout(0.5),
LSTM(50, return_sequences=False),
Dropout(0.5),
Dense(100, activation='relu'),
Dense(len(label_columns), activation='sigmoid')
])
model.compile(optimizer=Adam(learning_rate=0.001),
loss='binary_crossentropy',
metrics=['accuracy'])
model.summary()
WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.Adam` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.Adam`.
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_2 (LSTM) (None, 11996, 50) 20600
dropout_2 (Dropout) (None, 11996, 50) 0
lstm_3 (LSTM) (None, 50) 20200
dropout_3 (Dropout) (None, 50) 0
dense_2 (Dense) (None, 100) 5100
dense_3 (Dense) (None, 52) 5252
=================================================================
Total params: 51152 (199.81 KB)
Trainable params: 51152 (199.81 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
user_labels = {}
for user_id in users: # Assuming 'users' is a list of all user IDs
# Extract labels for the current user
# This assumes you have a way to select labels for each user similar to how you selected their data
user_labels_df = combined_csv_data[combined_csv_data['user_id'] == user_id]
# Adjust the column selection as necessary to match your actual label columns
labels_array = user_labels_df[label_columns].values
user_labels[user_id] = labels_array
for user_id, labels in user_labels.items():
print(f"User ID: {user_id}, Labels shape: {labels.shape}")
User ID: 81536B0A-8DBF-4D8A-AC24-9543E2E4C8E0, Labels shape: (6407, 52) User ID: 61359772-D8D8-480D-B623-7C636EAD0C81, Labels shape: (6079, 52) User ID: 40E170A7-607B-4578-AF04-F021C3B0384A, Labels shape: (7649, 52) User ID: 806289BC-AD52-4CC1-806C-0CDB14D65EB6, Labels shape: (9242, 52) User ID: 61976C24-1C50-4355-9C49-AAE44A7D09F6, Labels shape: (8730, 52) User ID: D7D20E2E-FC78-405D-B346-DBD3FD8FC92B, Labels shape: (6210, 52) User ID: 7D9BB102-A612-4E2A-8E22-3159752F55D8, Labels shape: (1600, 52) User ID: 5119D0F8-FCA8-4184-A4EB-19421A40DE0D, Labels shape: (6617, 52) User ID: 9DC38D04-E82E-4F29-AB52-B476535226F2, Labels shape: (9686, 52) User ID: A7599A50-24AE-46A6-8EA6-2576F1011D81, Labels shape: (3898, 52) User ID: 59EEFAE0-DEB0-4FFF-9250-54D2A03D0CF2, Labels shape: (7542, 52) User ID: 24E40C4C-A349-4F9F-93AB-01D00FB994AF, Labels shape: (4771, 52) User ID: 9759096F-1119-4E19-A0AD-6F16989C7E1C, Labels shape: (9959, 52) User ID: 1155FF54-63D3-4AB2-9863-8385D0BD0A13, Labels shape: (2685, 52) User ID: 96A358A0-FFF2-4239-B93E-C7425B901B47, Labels shape: (5819, 52) User ID: 78A91A4E-4A51-4065-BDA7-94755F0BB3BB, Labels shape: (11996, 52) User ID: F50235E0-DD67-4F2A-B00B-1F31ADA998B9, Labels shape: (2266, 52) User ID: 1538C99F-BA1E-4EFB-A949-6C7C47701B20, Labels shape: (6549, 52) User ID: 11B5EC4D-4133-4289-B475-4E737182A406, Labels shape: (8845, 52) User ID: 098A72A5-E3E5-4F54-A152-BBDA0DF7B694, Labels shape: (6813, 52) User ID: 59818CD2-24D7-4D32-B133-24C2FE3801E5, Labels shape: (5947, 52) User ID: 33A85C34-CFE4-4732-9E73-0A7AC861B27A, Labels shape: (6172, 52) User ID: 00EABED2-271D-49D8-B599-1D4A09240601, Labels shape: (2287, 52) User ID: 136562B6-95B2-483D-88DC-065F28409FD2, Labels shape: (6218, 52) User ID: B9724848-C7E2-45F4-9B3F-A1F38D864495, Labels shape: (7626, 52) User ID: CF722AA9-2533-4E51-9FEB-9EAC84EE9AAC, Labels shape: (3615, 52) User ID: FDAA70A1-42A3-4E3F-9AE3-3FDA412E03BF, Labels shape: (4973, 52) User ID: A5CDF89D-02A2-4EC1-89F8-F534FDABDD96, Labels shape: (6040, 52) User ID: 0BFC35E2-4817-4865-BFA7-764742302A2D, Labels shape: (3108, 52) User ID: BEF6C611-50DA-4971-A040-87FB979F3FC1, Labels shape: (3451, 52) User ID: 4FC32141-E888-4BFF-8804-12559A491D8C, Labels shape: (4979, 52) User ID: A76A5AF5-5A93-4CF2-A16E-62353BB70E8A, Labels shape: (7520, 52) User ID: 3600D531-0C55-44A7-AE95-A7A38519464E, Labels shape: (5203, 52) User ID: 2C32C23E-E30C-498A-8DD2-0EFB9150A02E, Labels shape: (8516, 52) User ID: 86A4F379-B305-473D-9D83-FC7D800180EF, Labels shape: (10738, 52) User ID: 99B204C0-DD5C-4BB7-83E8-A37281B8D769, Labels shape: (6038, 52) User ID: 74B86067-5D4B-43CF-82CF-341B76BEA0F4, Labels shape: (7298, 52) User ID: 5EF64122-B513-46AE-BCF1-E62AAC285D2C, Labels shape: (3911, 52) User ID: B7F9D634-263E-4A97-87F9-6FFB4DDCB36C, Labels shape: (9383, 52) User ID: A5A30F76-581E-4757-97A2-957553A2C6AA, Labels shape: (1667, 52) User ID: C48CE857-A0DD-4DDB-BEA5-3A25449B2153, Labels shape: (5092, 52) User ID: 83CF687B-7CEC-434B-9FE8-00C3D5799BE6, Labels shape: (9539, 52) User ID: 0A986513-7828-4D53-AA1F-E02D6DF9561B, Labels shape: (3960, 52) User ID: 7CE37510-56D0-4120-A1CF-0E23351428D2, Labels shape: (9761, 52) User ID: E65577C1-8D5D-4F70-AF23-B3ADB9D3DBA3, Labels shape: (3441, 52) User ID: CCAF77F0-FABB-4F2F-9E24-D56AD0C5A82F, Labels shape: (8472, 52) User ID: CA820D43-E5E2-42EF-9798-BE56F776370B, Labels shape: (7865, 52) User ID: 8023FE1A-D3B0-4E2C-A57A-9321B7FC755F, Labels shape: (9189, 52) User ID: 481F4DD2-7689-43B9-A2AA-C8772227162B, Labels shape: (6691, 52) User ID: CDA3BBF7-6631-45E8-85BA-EEB416B32A3C, Labels shape: (2860, 52) User ID: 4E98F91F-4654-42EF-B908-A3389443F2E7, Labels shape: (3250, 52) User ID: ECECC2AB-D32F-4F90-B74C-E12A1C69BBE2, Labels shape: (3530, 52) User ID: B09E373F-8A54-44C8-895B-0039390B859F, Labels shape: (8134, 52) User ID: BE3CA5A6-A561-4BBD-B7C9-5DF6805400FC, Labels shape: (8309, 52) User ID: 797D145F-3858-4A7F-A7C2-A4EB721E133C, Labels shape: (3593, 52) User ID: 1DBB0F6F-1F81-4A50-9DF4-CD62ACFA4842, Labels shape: (7375, 52) User ID: 665514DE-49DC-421F-8DCB-145D0B2609AD, Labels shape: (9167, 52) User ID: 5152A2DF-FAF3-4BA8-9CA9-E66B32671A53, Labels shape: (6617, 52) User ID: 0E6184E1-90C0-48EE-B25A-F1ECB7B9714E, Labels shape: (7521, 52) User ID: 27E04243-B138-4F40-A164-F40B60165CF3, Labels shape: (4927, 52)
padded_labels = []
# Loop over each user's labels
for user_id, labels in user_labels.items():
# Pad the user's label array to have the same length as the max_sequence_length
# We use the same 'post' padding to align with the input sequences
padded_label = pad_sequences([labels], maxlen=max_sequence_length, padding='post', dtype='float64')[0]
padded_labels.append(padded_label)
# Convert the list of padded label arrays into a single NumPy array
y_lstm = np.array(padded_labels)
print(f"Padded labels shape: {y_lstm.shape}")
Padded labels shape: (60, 11996, 52)
# Let's assume each sequence should map to a single set of labels not a timestamp
# we'll reduce the dimensionality of y_lstm to just two dimensions: (number of samples, number of labels)
y_lstm = y_lstm[:, 0, :]
print(f"Adjusted labels shape: {y_lstm.shape}")
Adjusted labels shape: (60, 52)
X_train, X_val, y_train, y_val = train_test_split(X_lstm, y_lstm, test_size=0.2, random_state=42)
# Training model
history = model.fit(
X_train,
y_train,
epochs=10,
batch_size=64,
validation_data=(X_val, y_val),
verbose=1
)
# Evaluating model
val_loss, val_acc = model.evaluate(X_val, y_val, verbose=0)
print(f'Validation accuracy: {val_acc}, Validation loss: {val_loss}')
Epoch 1/10 1/1 [==============================] - 23s 23s/step - loss: 0.6930 - accuracy: 0.0000e+00 - val_loss: 0.6921 - val_accuracy: 0.0000e+00 Epoch 2/10 1/1 [==============================] - 15s 15s/step - loss: 0.6920 - accuracy: 0.0417 - val_loss: 0.6908 - val_accuracy: 0.0000e+00 Epoch 3/10 1/1 [==============================] - 13s 13s/step - loss: 0.6907 - accuracy: 0.0417 - val_loss: 0.6892 - val_accuracy: 0.0000e+00 Epoch 4/10 1/1 [==============================] - 15s 15s/step - loss: 0.6892 - accuracy: 0.0625 - val_loss: 0.6873 - val_accuracy: 0.0000e+00 Epoch 5/10 1/1 [==============================] - 15s 15s/step - loss: 0.6868 - accuracy: 0.0417 - val_loss: 0.6850 - val_accuracy: 0.0000e+00 Epoch 6/10 1/1 [==============================] - 13s 13s/step - loss: 0.6844 - accuracy: 0.0417 - val_loss: 0.6820 - val_accuracy: 0.0000e+00 Epoch 7/10 1/1 [==============================] - 13s 13s/step - loss: 0.6815 - accuracy: 0.0000e+00 - val_loss: 0.6782 - val_accuracy: 0.0000e+00 Epoch 8/10 1/1 [==============================] - 13s 13s/step - loss: 0.6773 - accuracy: 0.0000e+00 - val_loss: 0.6733 - val_accuracy: 0.0000e+00 Epoch 9/10 1/1 [==============================] - 13s 13s/step - loss: 0.6728 - accuracy: 0.0208 - val_loss: 0.6669 - val_accuracy: 0.0000e+00 Epoch 10/10 1/1 [==============================] - 13s 13s/step - loss: 0.6649 - accuracy: 0.0000e+00 - val_loss: 0.6582 - val_accuracy: 0.0000e+00 Validation accuracy: 0.0, Validation loss: 0.6582168936729431
# Examining predictions from model
# Generating predictions for the validation set
predictions = model.predict(X_val)
# Applying threshold to convert propabilities to binary values
binary_predictions = (predictions > 0.5).astype(int)
# Lets examine a few predictions to get a sense of what our model is doing
for i, prediction in enumerate(binary_predictions[:5]):
print(f"Prediction for sample {i}: {prediction}")
1/1 [==============================] - 1s 1s/step Prediction for sample 0: [0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0] Prediction for sample 1: [0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0] Prediction for sample 2: [0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0] Prediction for sample 3: [0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0] Prediction for sample 4: [0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0]
# Lets map these to the label names so we know what the predictions mean
for i, prediction in enumerate(binary_predictions[:5]):
labeled_prediction = dict(zip(label_columns, prediction))
print(f"Prediction for sample {i}: {labeled_prediction}")
Prediction for sample 0: {'label:LYING_DOWN': 0, 'label:SITTING': 0, 'label:FIX_walking': 0, 'label:FIX_running': 0, 'label:BICYCLING': 0, 'label:SLEEPING': 0, 'label:LAB_WORK': 0, 'label:IN_CLASS': 0, 'label:IN_A_MEETING': 1, 'label:LOC_main_workplace': 0, 'label:OR_indoors': 0, 'label:OR_outside': 1, 'label:IN_A_CAR': 0, 'label:ON_A_BUS': 0, 'label:DRIVE_-_I_M_THE_DRIVER': 0, 'label:DRIVE_-_I_M_A_PASSENGER': 0, 'label:LOC_home': 0, 'label:FIX_restaurant': 0, 'label:PHONE_IN_POCKET': 0, 'label:OR_exercise': 0, 'label:COOKING': 0, 'label:SHOPPING': 0, 'label:STROLLING': 1, 'label:DRINKING__ALCOHOL_': 0, 'label:BATHING_-_SHOWER': 0, 'label:CLEANING': 0, 'label:DOING_LAUNDRY': 0, 'label:WASHING_DISHES': 0, 'label:WATCHING_TV': 0, 'label:SURFING_THE_INTERNET': 0, 'label:AT_A_PARTY': 0, 'label:AT_A_BAR': 0, 'label:LOC_beach': 0, 'label:SINGING': 0, 'label:TALKING': 0, 'label:COMPUTER_WORK': 0, 'label:EATING': 0, 'label:TOILET': 0, 'label:GROOMING': 1, 'label:DRESSING': 0, 'label:AT_THE_GYM': 0, 'label:STAIRS_-_GOING_UP': 0, 'label:STAIRS_-_GOING_DOWN': 0, 'label:ELEVATOR': 0, 'label:OR_standing': 0, 'label:AT_SCHOOL': 0, 'label:PHONE_IN_HAND': 0, 'label:PHONE_IN_BAG': 0, 'label:PHONE_ON_TABLE': 0, 'label:WITH_CO-WORKERS': 0, 'label:WITH_FRIENDS': 0, 'label:UNKNOWN': 0}
Prediction for sample 1: {'label:LYING_DOWN': 0, 'label:SITTING': 0, 'label:FIX_walking': 0, 'label:FIX_running': 0, 'label:BICYCLING': 0, 'label:SLEEPING': 0, 'label:LAB_WORK': 0, 'label:IN_CLASS': 0, 'label:IN_A_MEETING': 1, 'label:LOC_main_workplace': 0, 'label:OR_indoors': 0, 'label:OR_outside': 1, 'label:IN_A_CAR': 0, 'label:ON_A_BUS': 0, 'label:DRIVE_-_I_M_THE_DRIVER': 0, 'label:DRIVE_-_I_M_A_PASSENGER': 0, 'label:LOC_home': 0, 'label:FIX_restaurant': 0, 'label:PHONE_IN_POCKET': 0, 'label:OR_exercise': 0, 'label:COOKING': 0, 'label:SHOPPING': 0, 'label:STROLLING': 1, 'label:DRINKING__ALCOHOL_': 0, 'label:BATHING_-_SHOWER': 0, 'label:CLEANING': 0, 'label:DOING_LAUNDRY': 0, 'label:WASHING_DISHES': 0, 'label:WATCHING_TV': 0, 'label:SURFING_THE_INTERNET': 0, 'label:AT_A_PARTY': 0, 'label:AT_A_BAR': 0, 'label:LOC_beach': 0, 'label:SINGING': 0, 'label:TALKING': 0, 'label:COMPUTER_WORK': 0, 'label:EATING': 0, 'label:TOILET': 0, 'label:GROOMING': 1, 'label:DRESSING': 0, 'label:AT_THE_GYM': 0, 'label:STAIRS_-_GOING_UP': 0, 'label:STAIRS_-_GOING_DOWN': 0, 'label:ELEVATOR': 0, 'label:OR_standing': 0, 'label:AT_SCHOOL': 0, 'label:PHONE_IN_HAND': 0, 'label:PHONE_IN_BAG': 0, 'label:PHONE_ON_TABLE': 0, 'label:WITH_CO-WORKERS': 0, 'label:WITH_FRIENDS': 0, 'label:UNKNOWN': 0}
Prediction for sample 2: {'label:LYING_DOWN': 0, 'label:SITTING': 0, 'label:FIX_walking': 0, 'label:FIX_running': 0, 'label:BICYCLING': 0, 'label:SLEEPING': 0, 'label:LAB_WORK': 0, 'label:IN_CLASS': 0, 'label:IN_A_MEETING': 1, 'label:LOC_main_workplace': 0, 'label:OR_indoors': 0, 'label:OR_outside': 1, 'label:IN_A_CAR': 0, 'label:ON_A_BUS': 0, 'label:DRIVE_-_I_M_THE_DRIVER': 0, 'label:DRIVE_-_I_M_A_PASSENGER': 0, 'label:LOC_home': 0, 'label:FIX_restaurant': 0, 'label:PHONE_IN_POCKET': 0, 'label:OR_exercise': 0, 'label:COOKING': 0, 'label:SHOPPING': 0, 'label:STROLLING': 1, 'label:DRINKING__ALCOHOL_': 0, 'label:BATHING_-_SHOWER': 0, 'label:CLEANING': 0, 'label:DOING_LAUNDRY': 0, 'label:WASHING_DISHES': 0, 'label:WATCHING_TV': 0, 'label:SURFING_THE_INTERNET': 0, 'label:AT_A_PARTY': 0, 'label:AT_A_BAR': 0, 'label:LOC_beach': 0, 'label:SINGING': 0, 'label:TALKING': 0, 'label:COMPUTER_WORK': 0, 'label:EATING': 0, 'label:TOILET': 0, 'label:GROOMING': 1, 'label:DRESSING': 0, 'label:AT_THE_GYM': 0, 'label:STAIRS_-_GOING_UP': 0, 'label:STAIRS_-_GOING_DOWN': 0, 'label:ELEVATOR': 0, 'label:OR_standing': 0, 'label:AT_SCHOOL': 0, 'label:PHONE_IN_HAND': 0, 'label:PHONE_IN_BAG': 0, 'label:PHONE_ON_TABLE': 0, 'label:WITH_CO-WORKERS': 0, 'label:WITH_FRIENDS': 0, 'label:UNKNOWN': 0}
Prediction for sample 3: {'label:LYING_DOWN': 0, 'label:SITTING': 0, 'label:FIX_walking': 0, 'label:FIX_running': 0, 'label:BICYCLING': 0, 'label:SLEEPING': 0, 'label:LAB_WORK': 0, 'label:IN_CLASS': 0, 'label:IN_A_MEETING': 1, 'label:LOC_main_workplace': 0, 'label:OR_indoors': 0, 'label:OR_outside': 1, 'label:IN_A_CAR': 0, 'label:ON_A_BUS': 0, 'label:DRIVE_-_I_M_THE_DRIVER': 0, 'label:DRIVE_-_I_M_A_PASSENGER': 0, 'label:LOC_home': 0, 'label:FIX_restaurant': 0, 'label:PHONE_IN_POCKET': 0, 'label:OR_exercise': 0, 'label:COOKING': 0, 'label:SHOPPING': 0, 'label:STROLLING': 1, 'label:DRINKING__ALCOHOL_': 0, 'label:BATHING_-_SHOWER': 0, 'label:CLEANING': 0, 'label:DOING_LAUNDRY': 0, 'label:WASHING_DISHES': 0, 'label:WATCHING_TV': 0, 'label:SURFING_THE_INTERNET': 0, 'label:AT_A_PARTY': 0, 'label:AT_A_BAR': 0, 'label:LOC_beach': 0, 'label:SINGING': 0, 'label:TALKING': 0, 'label:COMPUTER_WORK': 0, 'label:EATING': 0, 'label:TOILET': 0, 'label:GROOMING': 1, 'label:DRESSING': 0, 'label:AT_THE_GYM': 0, 'label:STAIRS_-_GOING_UP': 0, 'label:STAIRS_-_GOING_DOWN': 0, 'label:ELEVATOR': 0, 'label:OR_standing': 0, 'label:AT_SCHOOL': 0, 'label:PHONE_IN_HAND': 0, 'label:PHONE_IN_BAG': 0, 'label:PHONE_ON_TABLE': 0, 'label:WITH_CO-WORKERS': 0, 'label:WITH_FRIENDS': 0, 'label:UNKNOWN': 0}
Prediction for sample 4: {'label:LYING_DOWN': 0, 'label:SITTING': 0, 'label:FIX_walking': 0, 'label:FIX_running': 0, 'label:BICYCLING': 0, 'label:SLEEPING': 0, 'label:LAB_WORK': 0, 'label:IN_CLASS': 0, 'label:IN_A_MEETING': 1, 'label:LOC_main_workplace': 0, 'label:OR_indoors': 0, 'label:OR_outside': 1, 'label:IN_A_CAR': 0, 'label:ON_A_BUS': 0, 'label:DRIVE_-_I_M_THE_DRIVER': 0, 'label:DRIVE_-_I_M_A_PASSENGER': 0, 'label:LOC_home': 0, 'label:FIX_restaurant': 0, 'label:PHONE_IN_POCKET': 0, 'label:OR_exercise': 0, 'label:COOKING': 0, 'label:SHOPPING': 0, 'label:STROLLING': 1, 'label:DRINKING__ALCOHOL_': 0, 'label:BATHING_-_SHOWER': 0, 'label:CLEANING': 0, 'label:DOING_LAUNDRY': 0, 'label:WASHING_DISHES': 0, 'label:WATCHING_TV': 0, 'label:SURFING_THE_INTERNET': 0, 'label:AT_A_PARTY': 0, 'label:AT_A_BAR': 0, 'label:LOC_beach': 0, 'label:SINGING': 0, 'label:TALKING': 0, 'label:COMPUTER_WORK': 0, 'label:EATING': 0, 'label:TOILET': 0, 'label:GROOMING': 1, 'label:DRESSING': 0, 'label:AT_THE_GYM': 0, 'label:STAIRS_-_GOING_UP': 0, 'label:STAIRS_-_GOING_DOWN': 0, 'label:ELEVATOR': 0, 'label:OR_standing': 0, 'label:AT_SCHOOL': 0, 'label:PHONE_IN_HAND': 0, 'label:PHONE_IN_BAG': 0, 'label:PHONE_ON_TABLE': 0, 'label:WITH_CO-WORKERS': 0, 'label:WITH_FRIENDS': 0, 'label:UNKNOWN': 0}
# Load all predictions into a dataframe
predictions_df = pd.DataFrame(binary_predictions, columns=label_columns)
# Save the DataFrame to a CSV file with label names as column headers
predictions_df.to_csv('LSTM_model_predictions_with_labels.csv', index=False)
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
X = y_lstm
y = X_lstm.reshape(X_lstm.shape[0], -1)
# Splitting the dataset for training and testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Training the linear model
linear_model = LinearRegression()
linear_model.fit(X_train, y_train)
# Making predictions - predicting features based on labels
y_pred = linear_model.predict(X_test)
# Evaluating the model
mse = mean_squared_error(y_test, y_pred)
print(f'Mean Squared Error: {mse}')
Mean Squared Error: 0.04169533805008713
predictions_df = pd.DataFrame(y_pred, columns=[f'Feature_{i}' for i in range(y_pred.shape[1])])
predictions_df.to_csv('predicted_features.csv', index=False)
print("Predictions saved to 'predicted_features.csv'.")
Predictions saved to 'predicted_features.csv'.
Use threshold of 0 to get all next X predictions
END OF NEW LSTM PREDICTION¶
Old numbers Shape of df is: (377346, 280) Processing predictions for X_test. Processing accuracy. Subset Accuracy for all together data is: 0.40275606201139524 File Updated
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import numpy as np
import pandas as pd
import pickle
# Define your MLP model
class MultiTaskMLP(nn.Module):
def __init__(self, input_size, output_size):
super(MultiTaskMLP, self).__init__()
self.layer1 = nn.Linear(input_size, 64)
self.layer2 = nn.Linear(64, 64)
self.output_layer = nn.Linear(64, output_size)
def forward(self, x):
x = torch.relu(self.layer1(x))
x = torch.relu(self.layer2(x))
x = self.output_layer(x)
return x
# Placeholder for models and accuracies
models_data_2 = {
'models': {},
'accuracies': {}
}
# Loop through each user
for user in users:
user_df = combined_csv_data_4_model[combined_csv_data_4_model['user_id'] == user]
X = user_df[features_to_include].values
y = user_df[output_columns].values
# Normalize features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# Split the dataset
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)
# Convert to PyTorch tensors
X_train_tensor = torch.FloatTensor(X_train)
y_train_tensor = torch.FloatTensor(y_train)
X_test_tensor = torch.FloatTensor(X_test)
y_test_tensor = torch.FloatTensor(y_test)
# DataLoader
train_dataset = TensorDataset(X_train_tensor, y_train_tensor)
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_dataset = TensorDataset(X_test_tensor, y_test_tensor)
test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False)
# Initialize the model
model = MultiTaskMLP(input_size=X_train_tensor.shape[1], output_size=y_train_tensor.shape[1])
criterion = nn.BCEWithLogitsLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
# Training loop
for epoch in range(10): # Adjust epochs as needed
model.train()
for inputs, labels in train_loader:
optimizer.zero_grad()
outputs = model(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
# Evaluation
model.eval()
correct = 0
total = 0
with torch.no_grad():
for inputs, labels in test_loader:
outputs = model(inputs)
predicted = torch.sigmoid(outputs) > 0.5 # Threshold at 0.5
total += labels.size(0)
correct += (predicted == labels).float().mean()
accuracy = correct / total
# Store the model and accuracy
models_data_2['models'][user] = model.state_dict() # Store state dict for minimal size
models_data_2['accuracies'][user] = accuracy.item()
print(f"User {user}: Accuracy = {accuracy.item():.4f}")
# Save the models and accuracies
with open('mlp_models.pkl', 'wb') as file:
pickle.dump(models_data_2, file)
User 81536B0A-8DBF-4D8A-AC24-9543E2E4C8E0: Accuracy = 0.0161 User 61359772-D8D8-480D-B623-7C636EAD0C81: Accuracy = 0.0151 User 40E170A7-607B-4578-AF04-F021C3B0384A: Accuracy = 0.0154 User 806289BC-AD52-4CC1-806C-0CDB14D65EB6: Accuracy = 0.0150 User 61976C24-1C50-4355-9C49-AAE44A7D09F6: Accuracy = 0.0155 User D7D20E2E-FC78-405D-B346-DBD3FD8FC92B: Accuracy = 0.0156 User 7D9BB102-A612-4E2A-8E22-3159752F55D8: Accuracy = 0.0149 User 5119D0F8-FCA8-4184-A4EB-19421A40DE0D: Accuracy = 0.0153 User 9DC38D04-E82E-4F29-AB52-B476535226F2: Accuracy = 0.0153 User A7599A50-24AE-46A6-8EA6-2576F1011D81: Accuracy = 0.0161 User 59EEFAE0-DEB0-4FFF-9250-54D2A03D0CF2: Accuracy = 0.0154 User 24E40C4C-A349-4F9F-93AB-01D00FB994AF: Accuracy = 0.0151 User 9759096F-1119-4E19-A0AD-6F16989C7E1C: Accuracy = 0.0157 User 1155FF54-63D3-4AB2-9863-8385D0BD0A13: Accuracy = 0.0162 User 96A358A0-FFF2-4239-B93E-C7425B901B47: Accuracy = 0.0160 User 78A91A4E-4A51-4065-BDA7-94755F0BB3BB: Accuracy = 0.0156 User F50235E0-DD67-4F2A-B00B-1F31ADA998B9: Accuracy = 0.0171 User 1538C99F-BA1E-4EFB-A949-6C7C47701B20: Accuracy = 0.0156 User 11B5EC4D-4133-4289-B475-4E737182A406: Accuracy = 0.0154 User 098A72A5-E3E5-4F54-A152-BBDA0DF7B694: Accuracy = 0.0158 User 59818CD2-24D7-4D32-B133-24C2FE3801E5: Accuracy = 0.0157 User 33A85C34-CFE4-4732-9E73-0A7AC861B27A: Accuracy = 0.0157 User 00EABED2-271D-49D8-B599-1D4A09240601: Accuracy = 0.0170 User 136562B6-95B2-483D-88DC-065F28409FD2: Accuracy = 0.0156 User B9724848-C7E2-45F4-9B3F-A1F38D864495: Accuracy = 0.0152 User CF722AA9-2533-4E51-9FEB-9EAC84EE9AAC: Accuracy = 0.0156 User FDAA70A1-42A3-4E3F-9AE3-3FDA412E03BF: Accuracy = 0.0158 User A5CDF89D-02A2-4EC1-89F8-F534FDABDD96: Accuracy = 0.0152 User 0BFC35E2-4817-4865-BFA7-764742302A2D: Accuracy = 0.0156 User BEF6C611-50DA-4971-A040-87FB979F3FC1: Accuracy = 0.0156 User 4FC32141-E888-4BFF-8804-12559A491D8C: Accuracy = 0.0156 User A76A5AF5-5A93-4CF2-A16E-62353BB70E8A: Accuracy = 0.0155 User 3600D531-0C55-44A7-AE95-A7A38519464E: Accuracy = 0.0158 User 2C32C23E-E30C-498A-8DD2-0EFB9150A02E: Accuracy = 0.0153 User 86A4F379-B305-473D-9D83-FC7D800180EF: Accuracy = 0.0156 User 99B204C0-DD5C-4BB7-83E8-A37281B8D769: Accuracy = 0.0153 User 74B86067-5D4B-43CF-82CF-341B76BEA0F4: Accuracy = 0.0154 User 5EF64122-B513-46AE-BCF1-E62AAC285D2C: Accuracy = 0.0161 User B7F9D634-263E-4A97-87F9-6FFB4DDCB36C: Accuracy = 0.0156 User A5A30F76-581E-4757-97A2-957553A2C6AA: Accuracy = 0.0174 User C48CE857-A0DD-4DDB-BEA5-3A25449B2153: Accuracy = 0.0151 User 83CF687B-7CEC-434B-9FE8-00C3D5799BE6: Accuracy = 0.0153 User 0A986513-7828-4D53-AA1F-E02D6DF9561B: Accuracy = 0.0160 User 7CE37510-56D0-4120-A1CF-0E23351428D2: Accuracy = 0.0153 User E65577C1-8D5D-4F70-AF23-B3ADB9D3DBA3: Accuracy = 0.0156 User CCAF77F0-FABB-4F2F-9E24-D56AD0C5A82F: Accuracy = 0.0156 User CA820D43-E5E2-42EF-9798-BE56F776370B: Accuracy = 0.0155 User 8023FE1A-D3B0-4E2C-A57A-9321B7FC755F: Accuracy = 0.0153 User 481F4DD2-7689-43B9-A2AA-C8772227162B: Accuracy = 0.0152 User CDA3BBF7-6631-45E8-85BA-EEB416B32A3C: Accuracy = 0.0154 User 4E98F91F-4654-42EF-B908-A3389443F2E7: Accuracy = 0.0165 User ECECC2AB-D32F-4F90-B74C-E12A1C69BBE2: Accuracy = 0.0165 User B09E373F-8A54-44C8-895B-0039390B859F: Accuracy = 0.0155 User BE3CA5A6-A561-4BBD-B7C9-5DF6805400FC: Accuracy = 0.0153 User 797D145F-3858-4A7F-A7C2-A4EB721E133C: Accuracy = 0.0163 User 1DBB0F6F-1F81-4A50-9DF4-CD62ACFA4842: Accuracy = 0.0156 User 665514DE-49DC-421F-8DCB-145D0B2609AD: Accuracy = 0.0153 User 5152A2DF-FAF3-4BA8-9CA9-E66B32671A53: Accuracy = 0.0153 User 0E6184E1-90C0-48EE-B25A-F1ECB7B9714E: Accuracy = 0.0154 User 27E04243-B138-4F40-A164-F40B60165CF3: Accuracy = 0.0157
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset
import numpy as np
# Assuming 'combined_csv_data_4_model', 'features_to_include', and 'output_columns' are defined
# Normalize features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(combined_csv_data_4_model[features_to_include])
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X_scaled, combined_csv_data_4_model[output_columns], test_size=0.2, random_state=42)
# Convert dataset to tensors
X_train_tensor = torch.FloatTensor(X_train)
y_train_tensor = torch.FloatTensor(y_train.values) # For multi-label
X_test_tensor = torch.FloatTensor(X_test)
y_test_tensor = torch.FloatTensor(y_test.values) # For multi-label
# DataLoader setup
train_dataset = TensorDataset(X_train_tensor, y_train_tensor)
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_dataset = TensorDataset(X_test_tensor, y_test_tensor)
test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False)
# Model setup
model = nn.Sequential(
nn.Linear(len(features_to_include), 64),
nn.BatchNorm1d(64),
nn.ReLU(),
nn.Linear(64, 64),
nn.BatchNorm1d(64),
nn.ReLU(),
nn.Linear(64, len(output_columns))
)
# Loss and optimizer setup
criterion = nn.BCEWithLogitsLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
# Training loop
model.train()
for epoch in range(10): # Number of epochs
for inputs, labels in train_loader:
lr = 0.0001 * (epoch + 1)
optimizer = optim.Adam(model.parameters(), lr=lr)
optimizer.zero_grad()
outputs = model(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
print(f'Epoch {epoch+1}/10, Loss: {loss.item():.4f}')
# Testing loop
model.eval()
correct = 0
total = 0
with torch.no_grad():
for inputs, labels in test_loader:
outputs = model(inputs)
predicted = torch.sigmoid(outputs) > 0.5 # Applying sigmoid and threshold for multi-label
total += labels.size(0)
correct += (predicted == labels.byte()).all(dim=1).sum().item() # Adjust for multi-label accuracy
accuracy = 100 * correct / total
print(f'Accuracy: {accuracy:.2f}%')
Epoch 1/10, Loss: nan Epoch 2/10, Loss: nan Epoch 3/10, Loss: nan Epoch 4/10, Loss: nan Epoch 5/10, Loss: nan Epoch 6/10, Loss: nan Epoch 7/10, Loss: nan Epoch 8/10, Loss: nan Epoch 9/10, Loss: nan Epoch 10/10, Loss: nan Accuracy: 0.00%
subset_accuracy(y_test, predictions.toarray())
0.6770670826833073
testing_df = pd.DataFrame(predictions.toarray())
testing_df.columns = y_test.columns
print('Predicted | Real Value')
cols = y_test.columns
for i in range(1):
for col in cols:
print(testing_df.iloc[i][col], y_test.iloc[i][col])
Predicted | Real Value 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
print(predictions.toarray()[1])
[1. 0. 0. 0. 0. 1. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
print(predictions)
(0, 0) 1.0 (4, 0) 1.0 (5, 0) 1.0 (6, 0) 1.0 (7, 0) 1.0 (8, 0) 1.0 (12, 0) 1.0 (14, 0) 1.0 (21, 0) 1.0 (23, 0) 1.0 (27, 0) 1.0 (30, 0) 1.0 (36, 0) 1.0 (41, 0) 1.0 (43, 0) 1.0 (45, 0) 1.0 (52, 0) 1.0 (56, 0) 1.0 (57, 0) 1.0 (58, 0) 1.0 (62, 0) 1.0 (69, 0) 1.0 (72, 0) 1.0 (76, 0) 1.0 (79, 0) 1.0 : : (782, 51) 1.0 (784, 51) 1.0 (801, 51) 1.0 (813, 51) 1.0 (814, 51) 1.0 (816, 51) 1.0 (822, 51) 1.0 (848, 51) 1.0 (858, 51) 1.0 (860, 51) 1.0 (893, 51) 1.0 (947, 51) 1.0 (966, 51) 1.0 (997, 51) 1.0 (1019, 51) 1.0 (1032, 51) 1.0 (1054, 51) 1.0 (1090, 51) 1.0 (1115, 51) 1.0 (1119, 51) 1.0 (1137, 51) 1.0 (1147, 51) 1.0 (1209, 51) 1.0 (1225, 51) 1.0 (1242, 51) 1.0
y_true.head()
--------------------------------------------------------------------------- NameError Traceback (most recent call last) Cell In[46], line 1 ----> 1 y_true.head() NameError: name 'y_true' is not defined
# Scale features
from sklearn.preprocessing import MinMaxScaler
s1 = MinMaxScaler(feature_range=(-1,1))
user_df_scaled = s1.fit_transform(user_df)
user_df_scaled = pd.DataFrame(user_df_scaled, columns=features_to_include)
# Create sequences
look_back = 4
generator = TimeseriesGenerator(user_df_scaled.values, user_df_scaled.values,
length=look_back, batch_size=1)
def create_lstm_model(input_shape, num_features):
model = Sequential([
LSTM(units = num_features, activation='relu', input_shape=input_shape, return_sequences=True),
Dropout(0.2),
Dense(num_features),
LSTM(units=num_features, return_sequences=True),
Dropout(0.2),
LSTM(units=num_features, return_sequences=True),
Dense(num_features),
Dropout(0.2),
LSTM(units=num_features, return_sequences=True),
Dense(num_features),
Dropout(0.2),
LSTM(units=num_features),
Dense(num_features),
Activation('linear'),
])
model.compile(optimizer='adam', loss='mse', metrics = ['accuracy'])
return model
model_path = 'LSTM_next_model.h5'
# Define and compile the LSTM model
model = create_lstm_model((look_back, len(features_to_include)), len(features_to_include))
import keras
if os.path.exists(model_path):
load_model(model_path)
else:
model.fit(generator, epochs=1)
model.save(model_path)
def predict_from_df(df, model, features_to_include, look_back=4):
df_filtered = df[features_to_include]
if len(df_filtered) >= look_back:
# Extract the last `look_back` rows for the prediction
last_sequences = df_filtered[-look_back:].values.reshape((1, look_back, len(features_to_include)))
else:
raise ValueError(f"DataFrame must have at least {look_back} rows for prediction.")
# Predict the next row using the LSTM model
predictions = model.predict(last_sequences)
return predictions
def compare_predictions_with_actual(df, model, features_to_include, look_back=3):
predictions = predict_from_df(user_df.iloc[1:5], model, features_to_include, look_back)
predictions = s1.inverse_transform(predictions)
actual_values = user_df.iloc[4][features_to_include].values # Adjust index if needed
return predictions, actual_values
# Assuming 'user_df' is already preprocessed appropriately, including scaling
predictions, actual_values = compare_predictions_with_actual(user_df, model, features_to_include, look_back=4)
# Now you can compare 'predictions' with 'actual_values'
# Note: If your data was scaled, you might need to inverse scale both predictions and actual values before comparison
print(f"{'Predictions':<15} | {'Actual Values':<15} | {'Diff':<15} | {'Diff %':<15}")
print("-" * 47) # Adjust the number based on the width of your columns
j = 0
for i in range(len(predictions[j])):
p = predictions[j][i]
a = actual_values[i]
diff = a-p
diff_p = ((p-a)/a)*100
print(f"{p:.6f}{'':<9} | {a:<15} | {diff:.6f} | {diff_p:.6f}")
1/1 [==============================] - 1s 859ms/step Predictions | Actual Values | Diff | Diff % ----------------------------------------------- 1.087502 | 0.992139 | -0.095363 | 9.611823 0.321921 | 0.008221 | -0.313700 | 3815.839554 0.173169 | -0.010007 | -0.183176 | -1830.479131 0.445987 | 0.017953 | -0.428034 | 2384.189988 0.838229 | 0.988784 | 0.150555 | -15.226278 1.023328 | 0.992739 | -0.030589 | 3.081304 1.255472 | 0.995041 | -0.260431 | 26.172922 1.498911 | 1.481067 | -0.017844 | 1.204784 6.612895 | 6.684577 | 0.071682 | -1.072342 4.992745 | 5.043079 | 0.050334 | -0.998082 1.474535 | 0.000594 | -1.473941 | 248138.173398 2.362392 | 0.00138 | -2.361012 | 171087.822370 1.889040 | 0.000869 | -1.888171 | 217280.896811 2.147585 | 0.013867 | -2.133718 | 15387.018931 0.849614 | 0.429784 | -0.419830 | 97.683986 4.013337 | 0.108995 | -3.904342 | 3582.129142 0.449318 | 0.354559 | -0.094759 | 26.725983 -0.031556 | -0.009355 | 0.022201 | 237.316455 0.014177 | 0.03863 | 0.024453 | -63.300321 -0.017402 | 0.991321 | 1.008723 | -101.755454 0.456519 | 0.004346 | -0.452173 | 10404.353118 0.346918 | 0.004645 | -0.342273 | 7368.640997 0.443765 | 0.008302 | -0.435463 | 5245.273394 -0.000225 | -0.06522 | -0.064995 | -99.654303 -0.031119 | -0.071359 | -0.040240 | -56.390348 0.018305 | -0.345674 | -0.363979 | -105.295327 1.537729 | 0.007862 | -1.529867 | 19459.002626 1.347964 | 0.017157 | -1.330807 | 7756.639107 1.943403 | 0.030883 | -1.912520 | 6192.793713 3.123033 | 0.042611 | -3.080422 | 7229.169862 0.917343 | 0.002787 | -0.914556 | 32815.077117 1.449819 | 0.004164 | -1.445655 | 34717.941594 1.988410 | 0.006372 | -1.982038 | 31105.431815 1.481999 | 0.726105 | -0.755894 | 104.102514 4.577291 | 4.771319 | 0.194028 | -4.066538 3.205980 | 4.166487 | 0.960507 | -23.053172 2.747242 | 3.367551 | 0.620309 | -18.420182 3.032277 | 3.929054 | 0.896777 | -22.824251 2.721468 | 3.223763 | 0.502295 | -15.581009 3.079525 | 4.54848 | 1.468955 | -32.295509 2.694658 | 1.581582 | -1.113076 | 70.377399 4.014255 | 3.15225 | -0.862005 | 27.345707 0.359544 | 0.010827 | -0.348717 | 3220.807795 -0.047386 | 5.2e-05 | 0.047438 | -91227.119958 -0.008391 | 0.000159 | 0.008550 | -5377.508647 0.152547 | -0.000403 | -0.152950 | -37952.876240 1.134649 | 0.018501 | -1.116148 | 6032.904603 1.307168 | 0.002943 | -1.304225 | 44316.166105 1.167371 | 0.002244 | -1.165127 | 51921.892738 0.010832 | -0.478131 | -0.488963 | -102.265437 0.013491 | -0.170299 | -0.183790 | -107.921990 0.010864 | 0.145422 | 0.134558 | -92.529241 224.925583 | 109.779389 | -115.146194 | 104.888718 61.624428 | 0.734185 | -60.890243 | 8293.583061 -26.361868 | 0.494123 | 26.855991 | -5435.082136 77.771667 | 0.986627 | -76.785040 | 7782.580497 221.473618 | 109.246493 | -112.227125 | 102.728354 239.007217 | 109.743838 | -129.263379 | 117.786458 276.287628 | 110.346706 | -165.940922 | 150.381401 1.701550 | 2.463646 | 0.762096 | -30.933674 4.537919 | 5.620379 | 1.082460 | -19.259554 2.681944 | 5.045849 | 2.363905 | -46.848506 2.527126 | 0.000193 | -2.526933 | 1309291.748102 2.538119 | 0.003022 | -2.535097 | 83888.048950 2.616662 | 0.002264 | -2.614398 | 115476.944587 0.230488 | 0.016411 | -0.214077 | 1304.474531 0.463765 | 0.430593 | -0.033172 | 7.703690 4.032182 | 0.8269 | -3.205282 | 387.626341 0.405648 | 0.174487 | -0.231161 | 132.480353 149.449905 | 73.192243 | -76.257662 | 104.188175 -141.063446 | -81.283276 | 59.780170 | 73.545473 -35.202381 | -9.281783 | 25.920598 | 279.263134 48.087212 | 0.75371 | -47.333502 | 6280.068144 57.660343 | 0.752368 | -56.907975 | 7563.848432 77.367630 | 0.832032 | -76.535598 | 9198.636351 -0.024326 | 0.092219 | 0.116545 | -126.378040 0.001552 | -0.130812 | -0.132364 | -101.186448 -0.011629 | 0.170884 | 0.182513 | -106.805483 0.962416 | 0.999895 | 0.037479 | -3.748324 0.494964 | 0.999896 | 0.504932 | -50.498478 0.494380 | 0.999893 | 0.505513 | -50.556734 0.492817 | 0.999876 | 0.507059 | -50.712175 0.355554 | 0.0 | -0.355554 | inf 4.520901 | 0.0 | -4.520901 | inf 0.288122 | 0.008365 | -0.279757 | 3344.372503 0.315624 | 0.013015 | -0.302609 | 2325.079203 73.248192 | 20.282 | -52.966192 | 261.148762 55.153439 | 0.0 | -55.153439 | inf -0.963250 | -6.907755 | -5.944505 | -86.055523 0.001241 | 3e-06 | -0.001238 | 41281.937141 0.001531 | 4e-06 | -0.001527 | 38170.902587 -0.000485 | 0.0 | 0.000485 | -inf 0.001622 | 0.0 | -0.001622 | inf 0.004261 | 1e-06 | -0.004260 | 426034.839654 0.014713 | 2e-06 | -0.014711 | 735527.999529 1.941853 | 4.847216 | 2.905363 | -59.938792 -0.771159 | -0.530176 | 0.240983 | 45.453384 -0.272603 | 0.410488 | 0.683091 | -166.409487 -0.663139 | -0.51432 | 0.148819 | 28.935073 -0.259132 | -0.736442 | -0.477310 | -64.812984 -0.112731 | -0.275571 | -0.162840 | -59.091770 -0.459598 | -0.885504 | -0.425906 | -48.097627 -0.090833 | -0.415427 | -0.324594 | -78.134912 -0.265383 | -0.461418 | -0.196035 | -42.485325 -0.054731 | -0.208905 | -0.154174 | -73.800832 -0.192168 | -0.374458 | -0.182290 | -48.680912 -0.248281 | -0.445467 | -0.197186 | -44.265087 -0.104653 | -0.291257 | -0.186604 | -64.068498 4.785089 | 1.817652 | -2.967437 | 163.256635 1.070282 | 1.004382 | -0.065900 | 6.561282 0.629550 | 0.576957 | -0.052593 | 9.115581 0.522728 | 0.549768 | 0.027040 | -4.918510 0.405758 | 0.368244 | -0.037514 | 10.187282 0.339536 | 0.237837 | -0.101699 | 42.759762 0.452471 | 0.327434 | -0.125037 | 38.186974 0.304638 | 0.257177 | -0.047461 | 18.454755 0.304042 | 0.247374 | -0.056668 | 22.907636 0.225618 | 0.23933 | 0.013712 | -5.729151 0.227806 | 0.182181 | -0.045625 | 25.043613 0.264394 | 0.155518 | -0.108876 | 70.008626 0.243714 | 0.180659 | -0.063055 | 34.902652 -0.581565 | 8.149602 | 8.731167 | -107.136116 -6.225636 | -8.115568 | -1.889932 | -23.287741 0.498104 | 0.0 | -0.498104 | inf 0.515633 | 0.0 | -0.515633 | inf 0.485093 | 0.0 | -0.485093 | inf 0.491543 | 1.0 | 0.508457 | -50.845724 0.494612 | 1.0 | 0.505388 | -50.538802 0.505099 | 0.0 | -0.505099 | inf 0.503978 | 0.0 | -0.503978 | inf 0.507959 | 0.0 | -0.507959 | inf 0.507657 | 0.0 | -0.507657 | inf 0.491726 | 0.0 | -0.491726 | inf 0.502110 | 0.0 | -0.502110 | inf 0.501529 | 0.0 | -0.501529 | inf 0.492409 | 1.0 | 0.507591 | -50.759065 0.496987 | 0.0 | -0.496987 | inf 0.498913 | 0.0 | -0.498913 | inf 0.487711 | 1.0 | 0.512289 | -51.228896 0.500403 | 0.0 | -0.500403 | inf 0.496930 | 0.0 | -0.496930 | inf 0.504891 | 0.0 | -0.504891 | inf 0.506478 | 0.0 | -0.506478 | inf 0.498209 | 1.0 | 0.501791 | -50.179106 0.498195 | 0.0 | -0.498195 | inf 0.504348 | 0.0 | -0.504348 | inf 0.496296 | 1.0 | 0.503704 | -50.370422 0.506261 | 0.0 | -0.506261 | inf 0.494220 | 0.0 | -0.494220 | inf 0.530882 | 0.16 | -0.370882 | 231.800953 0.491874 | 0.0 | -0.491874 | inf 0.505057 | 0.0 | -0.505057 | inf 0.490880 | 0.0 | -0.490880 | inf 0.508880 | 1.0 | 0.491120 | -49.111956 0.490089 | 1.0 | 0.509911 | -50.991109 0.515844 | 0.0 | -0.515844 | inf 0.503154 | 0.0 | -0.503154 | inf 0.499329 | 0.0 | -0.499329 | inf 1448656384.000000 | 1448316936.0 | -339448.000000 | 0.023437
/var/folders/9d/r4wkb8dj54b5k6_vd_8stbq00000gn/T/ipykernel_93077/1789851892.py:37: RuntimeWarning: divide by zero encountered in double_scalars diff_p = ((p-a)/a)*100
Testing CNN And LSTM for prediction¶
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, LSTM, Conv1D, MaxPooling1D, Dropout, Reshape
from tensorflow.keras.optimizers import Adam
# Adjust the input shape according to your dataset
input_shape = (X_train.shape[1], 1) # Assuming non-sequential data for simplicity
model = Sequential([
Conv1D(filters=64, kernel_size=3, activation='relu', input_shape=input_shape),
MaxPooling1D(pool_size=2),
Conv1D(filters=128, kernel_size=3, activation='relu'),
MaxPooling1D(pool_size=2),
# Instead of Flatten, use Reshape or adjust the model so it's suitable for LSTM input
# Reshape example (adjust the target shape according to your needs):
# This line is illustrative; actual reshaping depends on the output shape of the previous layer
Reshape((-1, 128)), # Adjust the target shape
LSTM(50, return_sequences=False), # If you want the LSTM to output a sequence, set return_sequences=True
Dropout(0.5),
Dense(100, activation='relu'),
Dense(len(label_columns), activation='sigmoid') # Use 'sigmoid' for multi-label classification
])
model.compile(optimizer=Adam(learning_rate=0.001),
loss='binary_crossentropy', # Use 'binary_crossentropy' for multi-label classification
metrics=['accuracy'])
model.summary()
WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.Adam` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.Adam`.
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv1d (Conv1D) (None, 33, 64) 256
max_pooling1d (MaxPooling1 (None, 16, 64) 0
D)
conv1d_1 (Conv1D) (None, 14, 128) 24704
max_pooling1d_1 (MaxPoolin (None, 7, 128) 0
g1D)
reshape (Reshape) (None, 7, 128) 0
lstm (LSTM) (None, 50) 35800
dropout (Dropout) (None, 50) 0
dense (Dense) (None, 100) 5100
dense_1 (Dense) (None, 52) 5252
=================================================================
Total params: 71112 (277.78 KB)
Trainable params: 71112 (277.78 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
# Reshape data for CNN if needed
X_train_reshaped = X_train.reshape((X_train.shape[0], X_train.shape[1], 1))
X_test_reshaped = X_test.reshape((X_test.shape[0], X_test.shape[1], 1))
# For arrays, assuming X_train_reshaped and y_train are your features and labels respectively
subset_size = 100 # Choose a small size for quick tests
X_train_subset = X_train_reshaped[:subset_size]
y_train_subset = y_train[:subset_size]
# Testing Shape Issues
model_name = 'first_try.h5'
if os.path.exists(model_name):
model = load_model(model_name)
else:
history = model.fit(X_train, y_train,
epochs=1,
batch_size=64,
validation_split=0.2,
verbose=1)
model.save(model_name)
test_loss, test_acc = model.evaluate(X_test_reshaped, y_test, verbose=2)
print(f'Test accuracy: {test_acc}, Test loss: {test_loss}')
WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.Adam` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.Adam`.
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) /Users/halladaykinsey/AAI-530/ExtraSensory_Combined_User_Data_v0.5.ipynb Cell 51 line 1 <a href='vscode-notebook-cell:/Users/halladaykinsey/AAI-530/ExtraSensory_Combined_User_Data_v0.5.ipynb#Y101sZmlsZQ%3D%3D?line=5'>6</a> history = model.fit(X_train, y_train, <a href='vscode-notebook-cell:/Users/halladaykinsey/AAI-530/ExtraSensory_Combined_User_Data_v0.5.ipynb#Y101sZmlsZQ%3D%3D?line=6'>7</a> epochs=1, <a href='vscode-notebook-cell:/Users/halladaykinsey/AAI-530/ExtraSensory_Combined_User_Data_v0.5.ipynb#Y101sZmlsZQ%3D%3D?line=7'>8</a> batch_size=64, <a href='vscode-notebook-cell:/Users/halladaykinsey/AAI-530/ExtraSensory_Combined_User_Data_v0.5.ipynb#Y101sZmlsZQ%3D%3D?line=8'>9</a> validation_split=0.2, <a href='vscode-notebook-cell:/Users/halladaykinsey/AAI-530/ExtraSensory_Combined_User_Data_v0.5.ipynb#Y101sZmlsZQ%3D%3D?line=9'>10</a> verbose=1) <a href='vscode-notebook-cell:/Users/halladaykinsey/AAI-530/ExtraSensory_Combined_User_Data_v0.5.ipynb#Y101sZmlsZQ%3D%3D?line=11'>12</a> model.save(model_name) ---> <a href='vscode-notebook-cell:/Users/halladaykinsey/AAI-530/ExtraSensory_Combined_User_Data_v0.5.ipynb#Y101sZmlsZQ%3D%3D?line=13'>14</a> test_loss, test_acc = model.evaluate(X_test_reshaped, y_test, verbose=2) <a href='vscode-notebook-cell:/Users/halladaykinsey/AAI-530/ExtraSensory_Combined_User_Data_v0.5.ipynb#Y101sZmlsZQ%3D%3D?line=14'>15</a> print(f'Test accuracy: {test_acc}, Test loss: {test_loss}') File /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/keras/src/utils/traceback_utils.py:70, in filter_traceback.<locals>.error_handler(*args, **kwargs) 67 filtered_tb = _process_traceback_frames(e.__traceback__) 68 # To get the full stack trace, call: 69 # `tf.debugging.disable_traceback_filtering()` ---> 70 raise e.with_traceback(filtered_tb) from None 71 finally: 72 del filtered_tb File /var/folders/8j/8zxjcfw125g4mfl6xvm5bl080000gn/T/__autograph_generated_filelqh3eeax.py:15, in outer_factory.<locals>.inner_factory.<locals>.tf__test_function(iterator) 13 try: 14 do_return = True ---> 15 retval_ = ag__.converted_call(ag__.ld(step_function), (ag__.ld(self), ag__.ld(iterator)), None, fscope) 16 except: 17 do_return = False ValueError: in user code: File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/keras/src/engine/training.py", line 2066, in test_function * return step_function(self, iterator) File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/keras/src/engine/training.py", line 2049, in step_function ** outputs = model.distribute_strategy.run(run_step, args=(data,)) File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/keras/src/engine/training.py", line 2037, in run_step ** outputs = model.test_step(data) File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/keras/src/engine/training.py", line 1917, in test_step y_pred = self(x, training=False) File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/keras/src/utils/traceback_utils.py", line 70, in error_handler raise e.with_traceback(filtered_tb) from None File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/keras/src/engine/input_spec.py", line 298, in assert_input_compatibility raise ValueError( ValueError: Input 0 of layer "sequential_4" is incompatible with the layer: expected shape=(None, 159, 1), found shape=(None, 35, 1)
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, LSTM, Conv1D, MaxPooling1D, Dropout, BatchNormalization, Reshape, Bidirectional
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.regularizers import l2
from tensorflow.keras.callbacks import LearningRateScheduler
import math
def scheduler(epoch, lr):
if epoch < 10:
return lr
else:
return lr * math.exp(-0.1)
callback = LearningRateScheduler(scheduler)
model = Sequential([
Conv1D(filters=64, kernel_size=3, activation='relu', input_shape=input_shape, kernel_regularizer=l2(0.001)),
BatchNormalization(),
MaxPooling1D(pool_size=2),
Conv1D(filters=128, kernel_size=3, activation='relu', kernel_regularizer=l2(0.001)),
BatchNormalization(),
MaxPooling1D(pool_size=2),
Reshape((-1, 128)), # Adjust based on the output shape of the previous layer
Bidirectional(LSTM(100, return_sequences=False)),
Dropout(0.5),
Dense(100, activation='relu', kernel_regularizer=l2(0.001)),
BatchNormalization(),
Dense(len(label_columns), activation='sigmoid') # Adjust based on your label columns
])
model.compile(optimizer=Adam(learning_rate=0.001),
loss='binary_crossentropy',
metrics=['accuracy'])
model.summary()
model_name = 'second_try.h5'
if os.path.exists(model_name):
model = load_model(model_name)
else:
model.fit(X_train, y_train, epochs=1,batch_size=128, validation_split=0.2)
model.save(model_name)
test_loss, test_acc = model.evaluate(X_test_reshaped, y_test, verbose=2)
print(f'Test accuracy: {test_acc}, Test loss: {test_loss}')
WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.Adam` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.Adam`.
Model: "sequential_9"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv1d_14 (Conv1D) (None, 157, 64) 256
batch_normalization_5 (Bat (None, 157, 64) 256
chNormalization)
max_pooling1d_14 (MaxPooli (None, 78, 64) 0
ng1D)
conv1d_15 (Conv1D) (None, 76, 128) 24704
batch_normalization_6 (Bat (None, 76, 128) 512
chNormalization)
max_pooling1d_15 (MaxPooli (None, 38, 128) 0
ng1D)
reshape_4 (Reshape) (None, 38, 128) 0
bidirectional_2 (Bidirecti (None, 200) 183200
onal)
dropout_16 (Dropout) (None, 200) 0
dense_22 (Dense) (None, 100) 20100
batch_normalization_7 (Bat (None, 100) 400
chNormalization)
dense_23 (Dense) (None, 52) 5252
=================================================================
Total params: 234680 (916.72 KB)
Trainable params: 234096 (914.44 KB)
Non-trainable params: 584 (2.28 KB)
_________________________________________________________________
WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.Adam` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.Adam`.
2359/2359 - 50s - loss: nan - accuracy: 0.0472 - 50s/epoch - 21ms/step Test accuracy: 0.04721081256866455, Test loss: nan
# Example fitting with callbacks
# model.fit(X_train, y_train, epochs=1,batch_size=128, validation_split=0.2)
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, LSTM, Conv1D, MaxPooling1D, Dropout, BatchNormalization, Reshape, Bidirectional
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.regularizers import l2
from tensorflow.keras.callbacks import LearningRateScheduler
import math
def scheduler(epoch, lr):
if epoch < 10:
return lr
else:
return lr * math.exp(-0.1)
callback = LearningRateScheduler(scheduler)
model = Sequential([
Conv1D(filters=64, kernel_size=3, activation='relu', input_shape=input_shape, kernel_regularizer=l2(0.001)),
BatchNormalization(),
MaxPooling1D(pool_size=2),
Reshape((-1, 128)), # Adjust based on the output shape of the previous layer
Bidirectional(LSTM(100, return_sequences=False)),
Dropout(0.5),
Dense(100, activation='relu', kernel_regularizer=l2(0.001)),
BatchNormalization(),
Dense(len(label_columns), activation='sigmoid') # Adjust based on your label columns
])
model.compile(optimizer=Adam(learning_rate=0.001),
loss='binary_crossentropy',
metrics=['accuracy'])
model.summary()
WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.Adam` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.Adam`.
Model: "sequential_3"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv1d_4 (Conv1D) (None, 157, 64) 256
batch_normalization_3 (Bat (None, 157, 64) 256
chNormalization)
max_pooling1d_4 (MaxPoolin (None, 78, 64) 0
g1D)
reshape_2 (Reshape) (None, 39, 128) 0
bidirectional_1 (Bidirecti (None, 200) 183200
onal)
dropout_6 (Dropout) (None, 200) 0
dense_8 (Dense) (None, 100) 20100
batch_normalization_4 (Bat (None, 100) 400
chNormalization)
dense_9 (Dense) (None, 52) 5252
=================================================================
Total params: 209464 (818.22 KB)
Trainable params: 209136 (816.94 KB)
Non-trainable params: 328 (1.28 KB)
_________________________________________________________________
model_name = 'third_try.h5'
if os.path.exists(model_name):
model = load_model(model_name)
else:
model.fit(X_train, y_train, epochs=1,batch_size=128, validation_split=0.2)
model.save(model_name)
# Evaluate the model
test_loss, test_acc = model.evaluate(X_test_reshaped, y_test, verbose=2)
print(f'Test accuracy: {test_acc}, Test loss: {test_loss}')
test_loss, test_acc = model.evaluate(X_test_reshaped, y_test, verbose=2)
print(f'Test accuracy: {test_acc}, Test loss: {test_loss}')
WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.Adam` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.Adam`.
2359/2359 - 41s - loss: nan - accuracy: 0.0472 - 41s/epoch - 17ms/step Test accuracy: 0.04721081256866455, Test loss: nan
model_name = 'third_try_2.h5'
if os.path.exists(model_name):
model = load_model(model_name)
else:
model.fit(X_train, y_train, epochs=1,batch_size=20, validation_split=0.2)
model.save(model_name)
# Evaluate the model
test_loss, test_acc = model.evaluate(X_test_reshaped, y_test, verbose=2)
print(f'Test accuracy: {test_acc}, Test loss: {test_loss}')
WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.Adam` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.Adam`.
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, LSTM, Conv1D, MaxPooling1D, Dropout, Reshape
from tensorflow.keras.optimizers import Adam
# Adjust the input shape according to your dataset
input_shape = (X_train.shape[1], 1) # Assuming non-sequential data for simplicity
model = Sequential([
LSTM(len(label_columns), return_sequences=True, input_shape=input_shape), # If you want the LSTM to output a sequence, set return_sequences=True
Dropout(0.2),
Conv1D(filters=len(label_columns), kernel_size=2, activation='relu'),
MaxPooling1D(pool_size=7),
Conv1D(filters=128, kernel_size=3, activation='relu'),
MaxPooling1D(pool_size=2),
Conv1D(filters=128, kernel_size=5, activation='relu'),
MaxPooling1D(pool_size=2),
# Instead of Flatten, use Reshape or adjust the model so it's suitable for LSTM input
# Reshape example (adjust the target shape according to your needs):
# This line is illustrative; actual reshaping depends on the output shape of the previous layer
Reshape((-1, 128)), # Adjust the target shape
LSTM(50, return_sequences=False), # If you want the LSTM to output a sequence, set return_sequences=True
Dropout(0.2),
Dense(100, activation='relu'),
Dense(len(label_columns), activation='sigmoid') # Use 'sigmoid' for multi-label classification
])
model.compile(optimizer=Adam(learning_rate=0.001),
loss='binary_crossentropy', # Use 'binary_crossentropy' for multi-label classification
metrics=['accuracy'])
model.summary()
WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.Adam` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.Adam`.
Model: "sequential_4"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_8 (LSTM) (None, 159, 52) 11232
dropout_7 (Dropout) (None, 159, 52) 0
conv1d_5 (Conv1D) (None, 158, 52) 5460
max_pooling1d_5 (MaxPoolin (None, 22, 52) 0
g1D)
conv1d_6 (Conv1D) (None, 20, 128) 20096
max_pooling1d_6 (MaxPoolin (None, 10, 128) 0
g1D)
conv1d_7 (Conv1D) (None, 6, 128) 82048
max_pooling1d_7 (MaxPoolin (None, 3, 128) 0
g1D)
reshape_3 (Reshape) (None, 3, 128) 0
lstm_9 (LSTM) (None, 50) 35800
dropout_8 (Dropout) (None, 50) 0
dense_10 (Dense) (None, 100) 5100
dense_11 (Dense) (None, 52) 5252
=================================================================
Total params: 164988 (644.48 KB)
Trainable params: 164988 (644.48 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
model_name = 'fourth_try_2.h5'
if os.path.exists(model_name):
model = load_model(model_name)
else:
model.fit(X_train, y_train, epochs=1, validation_split=0.2)
model.save(model_name)
WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.Adam` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.Adam`.
y_train.shape[1]
52
from keras.models import Sequential
from keras.layers import LSTM, Dropout, Conv1D, MaxPooling1D, Dense, Flatten
from keras.optimizers import Adam
scaler = MinMaxScaler()
X_scaled = scaler.fit_transform(X)
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)
# Adjust the input shape according to your dataset
# For LSTM, input should be in the form of (samples, timesteps, features)
# Assuming each sample is a sequence of vectors
input_shape = (X_train.shape[1], 1) # Adjust '1' if your data is already in sequences
model = Sequential()
# Start with an LSTM layer to process sequences
model.add(LSTM(units=64, return_sequences=True, input_shape=input_shape))
model.add(Dropout(0.2))
# Followed by CNN layers for feature extraction from sequences processed by LSTM
model.add(Conv1D(filters=64, kernel_size=2, activation='relu'))
model.add(MaxPooling1D(pool_size=2))
model.add(Conv1D(filters=128, kernel_size=3, activation='relu'))
model.add(MaxPooling1D(pool_size=2))
# Flatten the output to feed into a dense layer
model.add(Flatten())
# Additional dense layers or LSTM layers can be added here if needed
# Example: model.add(LSTM(50, return_sequences=False))
model.add(Dense(100, activation='relu'))
model.add(Dense(y_train.shape[1], activation='sigmoid')) # Assuming 'y' is one-hot encoded for multi-label classification
model.compile(optimizer=Adam(learning_rate=0.001),
loss='binary_crossentropy', # Adjust the loss function as per your problem
metrics=['accuracy'])
model.summary()
WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.Adam` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.Adam`.
Model: "sequential_5"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_10 (LSTM) (None, 159, 64) 16896
dropout_9 (Dropout) (None, 159, 64) 0
conv1d_8 (Conv1D) (None, 158, 64) 8256
max_pooling1d_8 (MaxPoolin (None, 79, 64) 0
g1D)
conv1d_9 (Conv1D) (None, 77, 128) 24704
max_pooling1d_9 (MaxPoolin (None, 38, 128) 0
g1D)
flatten (Flatten) (None, 4864) 0
dense_12 (Dense) (None, 100) 486500
dense_13 (Dense) (None, 52) 5252
=================================================================
Total params: 541608 (2.07 MB)
Trainable params: 541608 (2.07 MB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
model_name = 'fifth_try_3.h5'
if os.path.exists(model_name):
model = load_model(model_name)
else:
model.fit(X_train, y_train, epochs=1, validation_split=0.2)
model.save(model_name)
2024-02-13 07:19:00.028844: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:961] model_pruner failed: INVALID_ARGUMENT: Graph does not contain terminal node Adam/AssignAddVariableOp.
7547/7547 [==============================] - 222s 29ms/step - loss: 0.1694 - accuracy: 0.0482 - val_loss: 0.1682 - val_accuracy: 0.0476
/Users/zaina/miniconda3/lib/python3.10/site-packages/keras/src/engine/training.py:3103: UserWarning: You are saving your model as an HDF5 file via `model.save()`. This file format is considered legacy. We recommend using instead the native Keras format, e.g. `model.save('my_model.keras')`.
saving_api.save_model(
import numpy as np
import tensorflow as tf
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dropout, Conv1D, MaxPooling1D, Dense, Flatten
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import Callback
from sklearn.metrics import precision_score, recall_score, f1_score
# Assuming 'X' and 'y' are your features and labels, respectively
# Data Preprocessing
scaler = MinMaxScaler()
X_scaled = scaler.fit_transform(X) # Normalize features
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)
# Model Definition
model = Sequential([
LSTM(64, return_sequences=True, input_shape=(X_train.shape[1], 1)),
Dropout(0.2),
Conv1D(64, kernel_size=2, activation='relu'),
MaxPooling1D(pool_size=2),
Conv1D(128, kernel_size=3, activation='relu'),
MaxPooling1D(pool_size=2),
Flatten(),
Dense(100, activation='relu'),
Dense(y_train.shape[1], activation='sigmoid') # Output layer
])
model.compile(optimizer=Adam(learning_rate=0.001),
loss='binary_crossentropy',
metrics=['accuracy']) # Add other metrics as needed
# Custom Callback for Precision, Recall, F1 Score
class MetricsCallback(Callback):
def on_epoch_end(self, epoch, logs=None):
val_predict = (np.asarray(self.model.predict(X_test))).round()
val_targ = y_test
_val_precision = precision_score(val_targ, val_predict, average='micro')
_val_recall = recall_score(val_targ, val_predict, average='micro')
_val_f1 = f1_score(val_targ, val_predict, average='micro')
print(f' — val_precision: {_val_precision:.4f} — val_recall: {_val_recall:.4f} — val_f1: {_val_f1:.4f}')
# Model Training
model.fit(X_train, y_train,
validation_data=(X_test, y_test),
epochs=1, # Adjust number of epochs as necessary
batch_size=32, # Adjust batch size as necessary
callbacks=[MetricsCallback()])
# Note: This is a simplified example. In practice, you might need to adjust the model architecture, preprocessing steps,
# and training parameters based on the specifics of your dataset and task.
model_name = 'fifth_try_4.h5'
if os.path.exists(model_name):
model = load_model(model_name)
else:
model.fit(X_train, y_train, validation_split=0.2, epochs=1, batch_size=32, callbacks=[MetricsCallback()])
WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.Adam` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.Adam`. 2024-02-13 08:30:31.580864: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:961] model_pruner failed: INVALID_ARGUMENT: Graph does not contain terminal node Adam/AssignAddVariableOp.
2359/2359 [==============================] - 21s 9ms/step
/Users/zaina/miniconda3/lib/python3.10/site-packages/sklearn/metrics/_classification.py:1497: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
— val_precision: 0.0000 — val_recall: 0.0000 — val_f1: 0.0000 9434/9434 [==============================] - 302s 32ms/step - loss: 0.1690 - accuracy: 0.0474 - val_loss: 0.1682 - val_accuracy: 0.0472 2359/2359 [==============================] - 21s 9ms/step
/Users/zaina/miniconda3/lib/python3.10/site-packages/sklearn/metrics/_classification.py:1497: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
— val_precision: 0.0000 — val_recall: 0.0000 — val_f1: 0.0000 7547/7547 [==============================] - 244s 32ms/step - loss: 0.1677 - accuracy: 0.0468 - val_loss: 0.1676 - val_accuracy: 0.0476
hierarchy = build_hierarchy(X.columns)
formatted_hierarchy = format_hierarchy(hierarchy)
print(formatted_hierarchy)
- raw_acc:
- magnitude_stats:
- mean
- std
- moment3
- moment4
- percentile25
- percentile50
- percentile75
- value_entropy
- time_entropy
- magnitude_spectrum:
- log_energy_band0
- log_energy_band1
- log_energy_band2
- log_energy_band3
- log_energy_band4
- spectral_entropy
- magnitude_autocorrelation:
- period
- normalized_ac
- 3d:
- mean_x
- mean_y
- mean_z
- std_x
- std_y
- std_z
- ro_xy
- ro_xz
- ro_yz
- proc_gyro:
- magnitude_stats:
- mean
- std
- moment3
- moment4
- percentile25
- percentile50
- percentile75
- value_entropy
- time_entropy
- magnitude_spectrum:
- log_energy_band0
- log_energy_band1
- log_energy_band2
- log_energy_band3
- log_energy_band4
- spectral_entropy
- magnitude_autocorrelation:
- period
- normalized_ac
- 3d:
- mean_x
- mean_y
- mean_z
- std_x
- std_y
- std_z
- ro_xy
- ro_xz
- ro_yz
- raw_magnet:
- magnitude_stats:
- mean
- std
- moment3
- moment4
- percentile25
- percentile50
- percentile75
- value_entropy
- time_entropy
- magnitude_spectrum:
- log_energy_band0
- log_energy_band1
- log_energy_band2
- log_energy_band3
- log_energy_band4
- spectral_entropy
- magnitude_autocorrelation:
- period
- normalized_ac
- 3d:
- mean_x
- mean_y
- mean_z
- std_x
- std_y
- std_z
- ro_xy
- ro_xz
- ro_yz
- avr_cosine_similarity_lag_range0
- avr_cosine_similarity_lag_range1
- avr_cosine_similarity_lag_range2
- avr_cosine_similarity_lag_range3
- avr_cosine_similarity_lag_range4
- location:
- num_valid_updates
- log_latitude_range
- log_longitude_range
- best_horizontal_accuracy
- diameter
- log_diameter
- location_quick_features:
- std_lat
- std_long
- lat_change
- long_change
- mean_abs_lat_deriv
- mean_abs_long_deriv
- audio_naive:
- mfcc0:
- mean
- std
- mfcc1:
- mean
- std
- mfcc2:
- mean
- std
- mfcc3:
- mean
- std
- mfcc4:
- mean
- std
- mfcc5:
- mean
- std
- mfcc6:
- mean
- std
- mfcc7:
- mean
- std
- mfcc8:
- mean
- std
- mfcc9:
- mean
- std
- mfcc10:
- mean
- std
- mfcc11:
- mean
- std
- mfcc12:
- mean
- std
- audio_properties:
- max_abs_value
- normalization_multiplier
- discrete:
- app_state:
- is_active
- is_inactive
- is_background
- missing
- battery_plugged:
- is_ac
- is_usb
- is_wireless
- missing
- battery_state:
- is_unknown
- is_unplugged
- is_not_charging
- is_discharging
- is_charging
- is_full
- missing
- on_the_phone:
- is_False
- is_True
- missing
- ringer_mode:
- is_normal
- is_silent_no_vibrate
- is_silent_with_vibrate
- missing
- wifi_status:
- is_not_reachable
- is_reachable_via_wifi
- is_reachable_via_wwan
- missing
- time_of_day:
- between0and6
- between3and9
- between6and12
- between9and15
- between12and18
- between15and21
- between18and24
- between21and3
- lf_measurements:
- battery_level
- timestamp_numeric
Testing Batch Sizes¶
def find_best_batch_size(model, X_train, y_train, X_test, y_test, batch_sizes):
"""
Trains a given model using different batch sizes, evaluates performance on test data,
stores each trained model, and returns the best batch size along with its accuracy and a dictionary of models.
Parameters:
- model: The initial model to be trained.
- X_train, y_train: Training data and labels.
- X_test, y_test: Test data and labels.
- batch_sizes: List of batch sizes to test.
Returns:
- best_batch_size: The batch size yielding the highest accuracy on test data.
- best_acc: The highest accuracy achieved on test data.
- models_dict: A dictionary of saved model filenames keyed by their batch sizes.
"""
models_dict = {}
best_acc = 0
best_batch_size = None
for batch_size in batch_sizes:
print(f"Training with batch size: {batch_size}")
# Clone the original model architecture for a fair comparison
model_clone = clone_model(model)
model_clone.compile(optimizer=model.optimizer, loss=model.loss, metrics=model.metrics)
# Fit the model
model_clone.fit(X_train, y_train,
epochs=2,
batch_size=batch_size,
validation_split=0.2,
verbose=1)
# Evaluate the model
test_loss, test_acc = model_clone.evaluate(X_test, y_test, verbose=2)
print(f"Test accuracy: {test_acc}, Test loss: {test_loss}")
# Save the model
model_file_name = f'ExtraSensory_CNN_LSTM_bs{batch_size}.h5'
model_clone.save(model_file_name)
models_dict[batch_size] = model_file_name
# Update best model if current is better
if test_acc > best_acc:
best_acc = test_acc
best_batch_size = batch_size
print(f"Best Batch Size: {best_batch_size} with Test Accuracy: {best_acc}")
return best_batch_size, best_acc, models_dict
# Example usage:
batch_sizes = [128, 64, 16, 4, 1, None]
# Call the function and store its return values
best_batch_size, best_acc, models_dict = find_best_batch_size(model, X_train_reshaped, y_train, X_test_reshaped, y_test, batch_sizes)
# Now `models_dict` is available outside of the function
print("Available models and their batch sizes:")
for batch_size, model_path in models_dict.items():
print(f"Batch Size: {batch_size}, Model Path: {model_path}")
# You can load any model from `models_dict` for further use
# selected_model_path = models_dict[best_batch_size]
# loaded_model = load_model(selected_model_path)
# Evaluate the model
test_loss, test_acc = model.evaluate(X_test_reshaped, y_test, verbose=2)
print(f'Test accuracy: {test_acc}, Test loss: {test_loss}')
2359/2359 - 13s - loss: 0.1687 - accuracy: 0.0472 - 13s/epoch - 6ms/step Test accuracy: 0.04721081256866455, Test loss: 0.16865016520023346
model.save('ExtraSensory_CNN_LSTM_Model_v2.h5')
# Train the model
history = model.fit(X_train_reshaped, y_train,
epochs=10,
batch_size=64,
validation_split=0.2,
verbose=1)
Epoch 1/10 3774/3774 [==============================] - 98s 26ms/step - loss: 0.1721 - accuracy: 0.0507 - val_loss: 0.1682 - val_accuracy: 0.0476 Epoch 2/10 3774/3774 [==============================] - 64s 17ms/step - loss: 0.1692 - accuracy: 0.0539 - val_loss: 0.1683 - val_accuracy: 0.0476 Epoch 3/10 3774/3774 [==============================] - 64s 17ms/step - loss: 0.1693 - accuracy: 0.0538 - val_loss: 0.1680 - val_accuracy: 0.0476 Epoch 4/10 3774/3774 [==============================] - 64s 17ms/step - loss: 0.1693 - accuracy: 0.0554 - val_loss: 0.1680 - val_accuracy: 0.0476 Epoch 5/10 3774/3774 [==============================] - 64s 17ms/step - loss: 0.1694 - accuracy: 0.0572 - val_loss: 0.1683 - val_accuracy: 0.0476 Epoch 6/10 3774/3774 [==============================] - 64s 17ms/step - loss: 0.1695 - accuracy: 0.0586 - val_loss: 0.1682 - val_accuracy: 0.0476 Epoch 7/10 3774/3774 [==============================] - 64s 17ms/step - loss: 0.1696 - accuracy: 0.0593 - val_loss: 0.1694 - val_accuracy: 0.0476 Epoch 8/10 3774/3774 [==============================] - 64s 17ms/step - loss: 0.1697 - accuracy: 0.0593 - val_loss: 0.1684 - val_accuracy: 0.0476 Epoch 9/10 3774/3774 [==============================] - 65s 17ms/step - loss: 0.1698 - accuracy: 0.0610 - val_loss: 0.1681 - val_accuracy: 0.0476 Epoch 10/10 3774/3774 [==============================] - 65s 17ms/step - loss: 0.1698 - accuracy: 0.0625 - val_loss: 0.1681 - val_accuracy: 0.0476
# Train the model
history = model.fit(X_train_reshaped, y_train,
epochs=2,
batch_size=20,
validation_split=0.2,
verbose=1)
Epoch 1/2 12075/12075 [==============================] - 219s 18ms/step - loss: 0.1717 - accuracy: 0.0778 - val_loss: 0.1700 - val_accuracy: 0.0476 Epoch 2/2 12075/12075 [==============================] - 191s 16ms/step - loss: 0.1720 - accuracy: 0.0802 - val_loss: 0.1695 - val_accuracy: 0.0476
# Evaluate the model
test_loss, test_acc = model.evaluate(X_test_reshaped, y_test, verbose=2)
print(f'Test accuracy: {test_acc}, Test loss: {test_loss}')
2359/2359 - 15s - loss: 0.1700 - accuracy: 0.0472 - 15s/epoch - 6ms/step Test accuracy: 0.04721081256866455, Test loss: 0.16999678313732147
model.save('ExtraSensory_CNN_LSTM_Model_v2_bs_20.h5')
/Users/zaina/miniconda3/lib/python3.10/site-packages/keras/src/engine/training.py:3103: UserWarning: You are saving your model as an HDF5 file via `model.save()`. This file format is considered legacy. We recommend using instead the native Keras format, e.g. `model.save('my_model.keras')`.
saving_api.save_model(
# Train the model
history = model.fit(X_train_reshaped, y_train,
epochs=2,
batch_size=2,
validation_split=0.2,
verbose=1)
Epoch 1/2 21579/120750 [====>.........................] - ETA: 22:30 - loss: 0.1824 - accuracy: 0.1195
--------------------------------------------------------------------------- KeyboardInterrupt Traceback (most recent call last) Cell In[230], line 2 1 # Train the model ----> 2 history = model.fit(X_train_reshaped, y_train, 3 epochs=2, 4 batch_size=2, 5 validation_split=0.2, 6 verbose=1) File ~/miniconda3/lib/python3.10/site-packages/keras/src/utils/traceback_utils.py:65, in filter_traceback.<locals>.error_handler(*args, **kwargs) 63 filtered_tb = None 64 try: ---> 65 return fn(*args, **kwargs) 66 except Exception as e: 67 filtered_tb = _process_traceback_frames(e.__traceback__) File ~/miniconda3/lib/python3.10/site-packages/keras/src/engine/training.py:1807, in Model.fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_batch_size, validation_freq, max_queue_size, workers, use_multiprocessing) 1799 with tf.profiler.experimental.Trace( 1800 "train", 1801 epoch_num=epoch, (...) 1804 _r=1, 1805 ): 1806 callbacks.on_train_batch_begin(step) -> 1807 tmp_logs = self.train_function(iterator) 1808 if data_handler.should_sync: 1809 context.async_wait() File ~/miniconda3/lib/python3.10/site-packages/tensorflow/python/util/traceback_utils.py:150, in filter_traceback.<locals>.error_handler(*args, **kwargs) 148 filtered_tb = None 149 try: --> 150 return fn(*args, **kwargs) 151 except Exception as e: 152 filtered_tb = _process_traceback_frames(e.__traceback__) File ~/miniconda3/lib/python3.10/site-packages/tensorflow/python/eager/polymorphic_function/polymorphic_function.py:832, in Function.__call__(self, *args, **kwds) 829 compiler = "xla" if self._jit_compile else "nonXla" 831 with OptionalXlaContext(self._jit_compile): --> 832 result = self._call(*args, **kwds) 834 new_tracing_count = self.experimental_get_tracing_count() 835 without_tracing = (tracing_count == new_tracing_count) File ~/miniconda3/lib/python3.10/site-packages/tensorflow/python/eager/polymorphic_function/polymorphic_function.py:868, in Function._call(self, *args, **kwds) 865 self._lock.release() 866 # In this case we have created variables on the first call, so we run the 867 # defunned version which is guaranteed to never create variables. --> 868 return tracing_compilation.call_function( 869 args, kwds, self._no_variable_creation_config 870 ) 871 elif self._variable_creation_config is not None: 872 # Release the lock early so that multiple threads can perform the call 873 # in parallel. 874 self._lock.release() File ~/miniconda3/lib/python3.10/site-packages/tensorflow/python/eager/polymorphic_function/tracing_compilation.py:139, in call_function(args, kwargs, tracing_options) 137 bound_args = function.function_type.bind(*args, **kwargs) 138 flat_inputs = function.function_type.unpack_inputs(bound_args) --> 139 return function._call_flat( # pylint: disable=protected-access 140 flat_inputs, captured_inputs=function.captured_inputs 141 ) File ~/miniconda3/lib/python3.10/site-packages/tensorflow/python/eager/polymorphic_function/concrete_function.py:1323, in ConcreteFunction._call_flat(self, tensor_inputs, captured_inputs) 1319 possible_gradient_type = gradients_util.PossibleTapeGradientTypes(args) 1320 if (possible_gradient_type == gradients_util.POSSIBLE_GRADIENT_TYPES_NONE 1321 and executing_eagerly): 1322 # No tape is watching; skip to running the function. -> 1323 return self._inference_function.call_preflattened(args) 1324 forward_backward = self._select_forward_and_backward_functions( 1325 args, 1326 possible_gradient_type, 1327 executing_eagerly) 1328 forward_function, args_with_tangents = forward_backward.forward() File ~/miniconda3/lib/python3.10/site-packages/tensorflow/python/eager/polymorphic_function/atomic_function.py:216, in AtomicFunction.call_preflattened(self, args) 214 def call_preflattened(self, args: Sequence[core.Tensor]) -> Any: 215 """Calls with flattened tensor inputs and returns the structured output.""" --> 216 flat_outputs = self.call_flat(*args) 217 return self.function_type.pack_output(flat_outputs) File ~/miniconda3/lib/python3.10/site-packages/tensorflow/python/eager/polymorphic_function/atomic_function.py:251, in AtomicFunction.call_flat(self, *args) 249 with record.stop_recording(): 250 if self._bound_context.executing_eagerly(): --> 251 outputs = self._bound_context.call_function( 252 self.name, 253 list(args), 254 len(self.function_type.flat_outputs), 255 ) 256 else: 257 outputs = make_call_op_in_graph( 258 self, 259 list(args), 260 self._bound_context.function_call_options.as_attrs(), 261 ) File ~/miniconda3/lib/python3.10/site-packages/tensorflow/python/eager/context.py:1486, in Context.call_function(self, name, tensor_inputs, num_outputs) 1484 cancellation_context = cancellation.context() 1485 if cancellation_context is None: -> 1486 outputs = execute.execute( 1487 name.decode("utf-8"), 1488 num_outputs=num_outputs, 1489 inputs=tensor_inputs, 1490 attrs=attrs, 1491 ctx=self, 1492 ) 1493 else: 1494 outputs = execute.execute_with_cancellation( 1495 name.decode("utf-8"), 1496 num_outputs=num_outputs, (...) 1500 cancellation_manager=cancellation_context, 1501 ) File ~/miniconda3/lib/python3.10/site-packages/tensorflow/python/eager/execute.py:53, in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name) 51 try: 52 ctx.ensure_initialized() ---> 53 tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name, 54 inputs, attrs, num_outputs) 55 except core._NotOkStatusException as e: 56 if name is not None: KeyboardInterrupt:
# Evaluate the model
test_loss, test_acc = model.evaluate(X_test_reshaped, y_test, verbose=2)
print(f'Test accuracy: {test_acc}, Test loss: {test_loss}')
model.save('ExtraSensory_CNN_LSTM_Model_v2_bs_2.h5')
# Train the model
history = model.fit(X_train_reshaped, y_train,
epochs=2,
validation_split=0.2,
verbose=1)
# Evaluate the model
test_loss, test_acc = model.evaluate(X_test_reshaped, y_test, verbose=2)
print(f'Test accuracy: {test_acc}, Test loss: {test_loss}')
model.save('ExtraSensory_CNN_LSTM_Model_v2_bs_1.h5')
Testing Thresholds¶
# Calculate the total NaN count for each feature across all users
total_nan_counts = nan_counts_per_user.sum()
# Assuming `X_with_users` is your original DataFrame and has the same number of entries for each user,
# Calculate the total number of entries for a single feature across all users
total_entries_per_feature = len(X_with_users) # Or, more specifically, len(users) * average_entries_per_user if varies
# Calculate the percentage of missing data for each feature
percentage_missing = (total_nan_counts / total_entries_per_feature) * 100
def testing_threshold(threshold, testing_user, fill_type= 'ffill', epochs = 1 ):
# Decide on a threshold for removing columns, e.g., 1%
threshold = threshold
# Identify columns that exceed this threshold
columns_to_remove = percentage_missing[percentage_missing > threshold].index.tolist()
print("Removing ", len(columns_to_remove),"columns out of ", len(X_with_users.columns))
# Print out the columns to remove
# print("Columns to remove due to excessive missing data:", columns_to_remove)
features_to_include = [feature for feature in features if feature not in columns_to_remove]
# First User
user_df = X_with_users[X_with_users['user_id'] == users[testing_user]]
if fill_type == 'ffill':
# Forward fill
user_df = user_df[features_to_include].ffill()
elif fill_type == 'mean':
# Fill missing values with the mean of each column
mean_values = user_df[features_to_include].mean()
user_df = user_df[features_to_include].fillna(mean_values)
elif fill_type == 'median':
# Fill missing values with the median of each column
median_values = user_df[features_to_include].median()
user_df = user_df[features_to_include].fillna(median_values)
elif fill_type == 'zero':
# Fill missing values with zero
user_df = user_df[features_to_include].fillna(0)
else:
# If no valid fill_type is provided, print a warning or fill with a default method
print("Invalid fill_type. No changes made to user_df.")
scaler = StandardScaler()
user_df[features_to_include] = scaler.fit_transform(user_df)
# Define LSTM model architecture
def create_lstm_model(input_shape):
model = Sequential([
LSTM(50, activation='relu', input_shape=input_shape),
Dense(len(features_to_include), activation='relu')
])
model.compile(optimizer=tf.keras.optimizers.legacy.Adam(learning_rate=0.001, clipvalue=0.5), loss='mse')
return model
# Assuming timestamps are sorted; if not, sort user_df by timestamp here
user_df.sort_values('timestamp', inplace=True)
# Convert user_df to sequences for LSTM
look_back = 3
generator = TimeseriesGenerator(user_df[features_to_include].values, user_df[features_to_include].values,
length=look_back, batch_size=1)
# Create and train LSTM model on the selected user's data
model = create_lstm_model((look_back, len(features_to_include)))
model.fit(generator, epochs=epochs, verbose=1) # Adjust epochs and verbosity as needed
return model, features_to_include
numpy.int64
models , features_for_model = {} , {}
model_name = "threshold0_median_epoch1"
models[model_name], features_for_model[model_name] = testing_threshold(0, -1, fill_type = 'median')
best_model_one_so_far = 'threshold0_median_epoch1'
print(features_for_model[best_model_one_so_far])
192 out of 279 Columns to remove due to excessive missing data: ['raw_acc:magnitude_stats:mean', 'raw_acc:magnitude_stats:std', 'raw_acc:magnitude_stats:moment3', 'raw_acc:magnitude_stats:moment4', 'raw_acc:magnitude_stats:percentile25', 'raw_acc:magnitude_stats:percentile50', 'raw_acc:magnitude_stats:percentile75', 'raw_acc:magnitude_stats:value_entropy', 'raw_acc:magnitude_stats:time_entropy', 'raw_acc:magnitude_spectrum:log_energy_band0', 'raw_acc:magnitude_spectrum:log_energy_band1', 'raw_acc:magnitude_spectrum:log_energy_band2', 'raw_acc:magnitude_spectrum:log_energy_band3', 'raw_acc:magnitude_spectrum:log_energy_band4', 'raw_acc:magnitude_spectrum:spectral_entropy', 'raw_acc:magnitude_autocorrelation:period', 'raw_acc:magnitude_autocorrelation:normalized_ac', 'raw_acc:3d:mean_x', 'raw_acc:3d:mean_y', 'raw_acc:3d:mean_z', 'raw_acc:3d:std_x', 'raw_acc:3d:std_y', 'raw_acc:3d:std_z', 'raw_acc:3d:ro_xy', 'raw_acc:3d:ro_xz', 'raw_acc:3d:ro_yz', 'proc_gyro:magnitude_stats:mean', 'proc_gyro:magnitude_stats:std', 'proc_gyro:magnitude_stats:moment3', 'proc_gyro:magnitude_stats:moment4', 'proc_gyro:magnitude_stats:percentile25', 'proc_gyro:magnitude_stats:percentile50', 'proc_gyro:magnitude_stats:percentile75', 'proc_gyro:magnitude_stats:value_entropy', 'proc_gyro:magnitude_stats:time_entropy', 'proc_gyro:magnitude_spectrum:log_energy_band0', 'proc_gyro:magnitude_spectrum:log_energy_band1', 'proc_gyro:magnitude_spectrum:log_energy_band2', 'proc_gyro:magnitude_spectrum:log_energy_band3', 'proc_gyro:magnitude_spectrum:log_energy_band4', 'proc_gyro:magnitude_spectrum:spectral_entropy', 'proc_gyro:magnitude_autocorrelation:period', 'proc_gyro:magnitude_autocorrelation:normalized_ac', 'proc_gyro:3d:mean_x', 'proc_gyro:3d:mean_y', 'proc_gyro:3d:mean_z', 'proc_gyro:3d:std_x', 'proc_gyro:3d:std_y', 'proc_gyro:3d:std_z', 'proc_gyro:3d:ro_xy', 'proc_gyro:3d:ro_xz', 'proc_gyro:3d:ro_yz', 'raw_magnet:magnitude_stats:mean', 'raw_magnet:magnitude_stats:std', 'raw_magnet:magnitude_stats:moment3', 'raw_magnet:magnitude_stats:moment4', 'raw_magnet:magnitude_stats:percentile25', 'raw_magnet:magnitude_stats:percentile50', 'raw_magnet:magnitude_stats:percentile75', 'raw_magnet:magnitude_stats:value_entropy', 'raw_magnet:magnitude_stats:time_entropy', 'raw_magnet:magnitude_spectrum:log_energy_band0', 'raw_magnet:magnitude_spectrum:log_energy_band1', 'raw_magnet:magnitude_spectrum:log_energy_band2', 'raw_magnet:magnitude_spectrum:log_energy_band3', 'raw_magnet:magnitude_spectrum:log_energy_band4', 'raw_magnet:magnitude_spectrum:spectral_entropy', 'raw_magnet:magnitude_autocorrelation:period', 'raw_magnet:magnitude_autocorrelation:normalized_ac', 'raw_magnet:3d:mean_x', 'raw_magnet:3d:mean_y', 'raw_magnet:3d:mean_z', 'raw_magnet:3d:std_x', 'raw_magnet:3d:std_y', 'raw_magnet:3d:std_z', 'raw_magnet:3d:ro_xy', 'raw_magnet:3d:ro_xz', 'raw_magnet:3d:ro_yz', 'raw_magnet:avr_cosine_similarity_lag_range0', 'raw_magnet:avr_cosine_similarity_lag_range1', 'raw_magnet:avr_cosine_similarity_lag_range2', 'raw_magnet:avr_cosine_similarity_lag_range3', 'raw_magnet:avr_cosine_similarity_lag_range4', 'watch_acceleration:magnitude_stats:mean', 'watch_acceleration:magnitude_stats:std', 'watch_acceleration:magnitude_stats:moment3', 'watch_acceleration:magnitude_stats:moment4', 'watch_acceleration:magnitude_stats:percentile25', 'watch_acceleration:magnitude_stats:percentile50', 'watch_acceleration:magnitude_stats:percentile75', 'watch_acceleration:magnitude_stats:value_entropy', 'watch_acceleration:magnitude_stats:time_entropy', 'watch_acceleration:magnitude_spectrum:log_energy_band0', 'watch_acceleration:magnitude_spectrum:log_energy_band1', 'watch_acceleration:magnitude_spectrum:log_energy_band2', 'watch_acceleration:magnitude_spectrum:log_energy_band3', 'watch_acceleration:magnitude_spectrum:log_energy_band4', 'watch_acceleration:magnitude_spectrum:spectral_entropy', 'watch_acceleration:magnitude_autocorrelation:period', 'watch_acceleration:magnitude_autocorrelation:normalized_ac', 'watch_acceleration:3d:mean_x', 'watch_acceleration:3d:mean_y', 'watch_acceleration:3d:mean_z', 'watch_acceleration:3d:std_x', 'watch_acceleration:3d:std_y', 'watch_acceleration:3d:std_z', 'watch_acceleration:3d:ro_xy', 'watch_acceleration:3d:ro_xz', 'watch_acceleration:3d:ro_yz', 'watch_acceleration:spectrum:x_log_energy_band0', 'watch_acceleration:spectrum:x_log_energy_band1', 'watch_acceleration:spectrum:x_log_energy_band2', 'watch_acceleration:spectrum:x_log_energy_band3', 'watch_acceleration:spectrum:x_log_energy_band4', 'watch_acceleration:spectrum:y_log_energy_band0', 'watch_acceleration:spectrum:y_log_energy_band1', 'watch_acceleration:spectrum:y_log_energy_band2', 'watch_acceleration:spectrum:y_log_energy_band3', 'watch_acceleration:spectrum:y_log_energy_band4', 'watch_acceleration:spectrum:z_log_energy_band0', 'watch_acceleration:spectrum:z_log_energy_band1', 'watch_acceleration:spectrum:z_log_energy_band2', 'watch_acceleration:spectrum:z_log_energy_band3', 'watch_acceleration:spectrum:z_log_energy_band4', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range0', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range1', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range2', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range3', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range4', 'watch_heading:mean_cos', 'watch_heading:std_cos', 'watch_heading:mom3_cos', 'watch_heading:mom4_cos', 'watch_heading:mean_sin', 'watch_heading:std_sin', 'watch_heading:mom3_sin', 'watch_heading:mom4_sin', 'watch_heading:entropy_8bins', 'location:num_valid_updates', 'location:log_latitude_range', 'location:log_longitude_range', 'location:min_altitude', 'location:max_altitude', 'location:min_speed', 'location:max_speed', 'location:best_horizontal_accuracy', 'location:best_vertical_accuracy', 'location:diameter', 'location:log_diameter', 'location_quick_features:std_lat', 'location_quick_features:std_long', 'location_quick_features:lat_change', 'location_quick_features:long_change', 'location_quick_features:mean_abs_lat_deriv', 'location_quick_features:mean_abs_long_deriv', 'audio_naive:mfcc0:mean', 'audio_naive:mfcc1:mean', 'audio_naive:mfcc2:mean', 'audio_naive:mfcc3:mean', 'audio_naive:mfcc4:mean', 'audio_naive:mfcc5:mean', 'audio_naive:mfcc6:mean', 'audio_naive:mfcc7:mean', 'audio_naive:mfcc8:mean', 'audio_naive:mfcc9:mean', 'audio_naive:mfcc10:mean', 'audio_naive:mfcc11:mean', 'audio_naive:mfcc12:mean', 'audio_naive:mfcc0:std', 'audio_naive:mfcc1:std', 'audio_naive:mfcc2:std', 'audio_naive:mfcc3:std', 'audio_naive:mfcc4:std', 'audio_naive:mfcc5:std', 'audio_naive:mfcc6:std', 'audio_naive:mfcc7:std', 'audio_naive:mfcc8:std', 'audio_naive:mfcc9:std', 'audio_naive:mfcc10:std', 'audio_naive:mfcc11:std', 'audio_naive:mfcc12:std', 'audio_properties:max_abs_value', 'audio_properties:normalization_multiplier', 'lf_measurements:light', 'lf_measurements:pressure', 'lf_measurements:proximity_cm', 'lf_measurements:proximity', 'lf_measurements:relative_humidity', 'lf_measurements:battery_level', 'lf_measurements:screen_brightness', 'lf_measurements:temperature_ambient', 'total_length'] WARNING:tensorflow:Layer lstm_31 will not use cuDNN kernels since it doesn't meet the criteria. It will use a generic GPU kernel as fallback when running on GPU. 4924/4924 [==============================] - 95s 19ms/step - loss: 1.6473 ['timestamp', 'discrete:app_state:is_active', 'discrete:app_state:is_inactive', 'discrete:app_state:is_background', 'discrete:app_state:missing', 'discrete:battery_plugged:is_ac', 'discrete:battery_plugged:is_usb', 'discrete:battery_plugged:is_wireless', 'discrete:battery_plugged:missing', 'discrete:battery_state:is_unknown', 'discrete:battery_state:is_unplugged', 'discrete:battery_state:is_not_charging', 'discrete:battery_state:is_discharging', 'discrete:battery_state:is_charging', 'discrete:battery_state:is_full', 'discrete:battery_state:missing', 'discrete:on_the_phone:is_False', 'discrete:on_the_phone:is_True', 'discrete:on_the_phone:missing', 'discrete:ringer_mode:is_normal', 'discrete:ringer_mode:is_silent_no_vibrate', 'discrete:ringer_mode:is_silent_with_vibrate', 'discrete:ringer_mode:missing', 'discrete:wifi_status:is_not_reachable', 'discrete:wifi_status:is_reachable_via_wifi', 'discrete:wifi_status:is_reachable_via_wwan', 'discrete:wifi_status:missing', 'discrete:time_of_day:between0and6', 'discrete:time_of_day:between3and9', 'discrete:time_of_day:between6and12', 'discrete:time_of_day:between9and15', 'discrete:time_of_day:between12and18', 'discrete:time_of_day:between15and21', 'discrete:time_of_day:between18and24', 'discrete:time_of_day:between21and3']
## Testing
import numpy as np
from sklearn.preprocessing import StandardScaler
def predict_from_df(df, model, features_to_include, look_back=3):
"""
Process the given DataFrame and predict the next value using the LSTM model.
Parameters:
- df: DataFrame to process and predict from.
- model: Trained LSTM model to use for predictions.
- features_to_include: List of feature names to include in the prediction.
- look_back: Number of previous time steps to use as input for predictions.
Returns:
- predictions: Predicted values for the next time step.
"""
# Ensure the DataFrame contains the necessary features
if not all(feature in df.columns for feature in features_to_include):
raise ValueError("DataFrame missing required features")
# Fill missing values with median
df_filled = df[features_to_include].fillna(df[features_to_include].median())
# Scale features
scaler = StandardScaler().fit(df_filled)
df_scaled = scaler.transform(df_filled)
# Create sequences
sequences = np.array([df_scaled[i - look_back:i] for i in range(look_back, len(df_scaled) + 1)])
# Predict using the LSTM model
predictions = model.predict(sequences)
return predictions
# Example of how to use the function
# Ensure 'model', 'features_to_include', and 'look_back' are defined as per your model's training setup
predictions = predict_from_df(df_user[features_for_model[model_name]].iloc[1:4], models[best_model_one_so_far], features_for_model[best_model_one_so_far], look_back=3)
print(predictions)
192 out of 279 Columns to remove due to excessive missing data: ['raw_acc:magnitude_stats:mean', 'raw_acc:magnitude_stats:std', 'raw_acc:magnitude_stats:moment3', 'raw_acc:magnitude_stats:moment4', 'raw_acc:magnitude_stats:percentile25', 'raw_acc:magnitude_stats:percentile50', 'raw_acc:magnitude_stats:percentile75', 'raw_acc:magnitude_stats:value_entropy', 'raw_acc:magnitude_stats:time_entropy', 'raw_acc:magnitude_spectrum:log_energy_band0', 'raw_acc:magnitude_spectrum:log_energy_band1', 'raw_acc:magnitude_spectrum:log_energy_band2', 'raw_acc:magnitude_spectrum:log_energy_band3', 'raw_acc:magnitude_spectrum:log_energy_band4', 'raw_acc:magnitude_spectrum:spectral_entropy', 'raw_acc:magnitude_autocorrelation:period', 'raw_acc:magnitude_autocorrelation:normalized_ac', 'raw_acc:3d:mean_x', 'raw_acc:3d:mean_y', 'raw_acc:3d:mean_z', 'raw_acc:3d:std_x', 'raw_acc:3d:std_y', 'raw_acc:3d:std_z', 'raw_acc:3d:ro_xy', 'raw_acc:3d:ro_xz', 'raw_acc:3d:ro_yz', 'proc_gyro:magnitude_stats:mean', 'proc_gyro:magnitude_stats:std', 'proc_gyro:magnitude_stats:moment3', 'proc_gyro:magnitude_stats:moment4', 'proc_gyro:magnitude_stats:percentile25', 'proc_gyro:magnitude_stats:percentile50', 'proc_gyro:magnitude_stats:percentile75', 'proc_gyro:magnitude_stats:value_entropy', 'proc_gyro:magnitude_stats:time_entropy', 'proc_gyro:magnitude_spectrum:log_energy_band0', 'proc_gyro:magnitude_spectrum:log_energy_band1', 'proc_gyro:magnitude_spectrum:log_energy_band2', 'proc_gyro:magnitude_spectrum:log_energy_band3', 'proc_gyro:magnitude_spectrum:log_energy_band4', 'proc_gyro:magnitude_spectrum:spectral_entropy', 'proc_gyro:magnitude_autocorrelation:period', 'proc_gyro:magnitude_autocorrelation:normalized_ac', 'proc_gyro:3d:mean_x', 'proc_gyro:3d:mean_y', 'proc_gyro:3d:mean_z', 'proc_gyro:3d:std_x', 'proc_gyro:3d:std_y', 'proc_gyro:3d:std_z', 'proc_gyro:3d:ro_xy', 'proc_gyro:3d:ro_xz', 'proc_gyro:3d:ro_yz', 'raw_magnet:magnitude_stats:mean', 'raw_magnet:magnitude_stats:std', 'raw_magnet:magnitude_stats:moment3', 'raw_magnet:magnitude_stats:moment4', 'raw_magnet:magnitude_stats:percentile25', 'raw_magnet:magnitude_stats:percentile50', 'raw_magnet:magnitude_stats:percentile75', 'raw_magnet:magnitude_stats:value_entropy', 'raw_magnet:magnitude_stats:time_entropy', 'raw_magnet:magnitude_spectrum:log_energy_band0', 'raw_magnet:magnitude_spectrum:log_energy_band1', 'raw_magnet:magnitude_spectrum:log_energy_band2', 'raw_magnet:magnitude_spectrum:log_energy_band3', 'raw_magnet:magnitude_spectrum:log_energy_band4', 'raw_magnet:magnitude_spectrum:spectral_entropy', 'raw_magnet:magnitude_autocorrelation:period', 'raw_magnet:magnitude_autocorrelation:normalized_ac', 'raw_magnet:3d:mean_x', 'raw_magnet:3d:mean_y', 'raw_magnet:3d:mean_z', 'raw_magnet:3d:std_x', 'raw_magnet:3d:std_y', 'raw_magnet:3d:std_z', 'raw_magnet:3d:ro_xy', 'raw_magnet:3d:ro_xz', 'raw_magnet:3d:ro_yz', 'raw_magnet:avr_cosine_similarity_lag_range0', 'raw_magnet:avr_cosine_similarity_lag_range1', 'raw_magnet:avr_cosine_similarity_lag_range2', 'raw_magnet:avr_cosine_similarity_lag_range3', 'raw_magnet:avr_cosine_similarity_lag_range4', 'watch_acceleration:magnitude_stats:mean', 'watch_acceleration:magnitude_stats:std', 'watch_acceleration:magnitude_stats:moment3', 'watch_acceleration:magnitude_stats:moment4', 'watch_acceleration:magnitude_stats:percentile25', 'watch_acceleration:magnitude_stats:percentile50', 'watch_acceleration:magnitude_stats:percentile75', 'watch_acceleration:magnitude_stats:value_entropy', 'watch_acceleration:magnitude_stats:time_entropy', 'watch_acceleration:magnitude_spectrum:log_energy_band0', 'watch_acceleration:magnitude_spectrum:log_energy_band1', 'watch_acceleration:magnitude_spectrum:log_energy_band2', 'watch_acceleration:magnitude_spectrum:log_energy_band3', 'watch_acceleration:magnitude_spectrum:log_energy_band4', 'watch_acceleration:magnitude_spectrum:spectral_entropy', 'watch_acceleration:magnitude_autocorrelation:period', 'watch_acceleration:magnitude_autocorrelation:normalized_ac', 'watch_acceleration:3d:mean_x', 'watch_acceleration:3d:mean_y', 'watch_acceleration:3d:mean_z', 'watch_acceleration:3d:std_x', 'watch_acceleration:3d:std_y', 'watch_acceleration:3d:std_z', 'watch_acceleration:3d:ro_xy', 'watch_acceleration:3d:ro_xz', 'watch_acceleration:3d:ro_yz', 'watch_acceleration:spectrum:x_log_energy_band0', 'watch_acceleration:spectrum:x_log_energy_band1', 'watch_acceleration:spectrum:x_log_energy_band2', 'watch_acceleration:spectrum:x_log_energy_band3', 'watch_acceleration:spectrum:x_log_energy_band4', 'watch_acceleration:spectrum:y_log_energy_band0', 'watch_acceleration:spectrum:y_log_energy_band1', 'watch_acceleration:spectrum:y_log_energy_band2', 'watch_acceleration:spectrum:y_log_energy_band3', 'watch_acceleration:spectrum:y_log_energy_band4', 'watch_acceleration:spectrum:z_log_energy_band0', 'watch_acceleration:spectrum:z_log_energy_band1', 'watch_acceleration:spectrum:z_log_energy_band2', 'watch_acceleration:spectrum:z_log_energy_band3', 'watch_acceleration:spectrum:z_log_energy_band4', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range0', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range1', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range2', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range3', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range4', 'watch_heading:mean_cos', 'watch_heading:std_cos', 'watch_heading:mom3_cos', 'watch_heading:mom4_cos', 'watch_heading:mean_sin', 'watch_heading:std_sin', 'watch_heading:mom3_sin', 'watch_heading:mom4_sin', 'watch_heading:entropy_8bins', 'location:num_valid_updates', 'location:log_latitude_range', 'location:log_longitude_range', 'location:min_altitude', 'location:max_altitude', 'location:min_speed', 'location:max_speed', 'location:best_horizontal_accuracy', 'location:best_vertical_accuracy', 'location:diameter', 'location:log_diameter', 'location_quick_features:std_lat', 'location_quick_features:std_long', 'location_quick_features:lat_change', 'location_quick_features:long_change', 'location_quick_features:mean_abs_lat_deriv', 'location_quick_features:mean_abs_long_deriv', 'audio_naive:mfcc0:mean', 'audio_naive:mfcc1:mean', 'audio_naive:mfcc2:mean', 'audio_naive:mfcc3:mean', 'audio_naive:mfcc4:mean', 'audio_naive:mfcc5:mean', 'audio_naive:mfcc6:mean', 'audio_naive:mfcc7:mean', 'audio_naive:mfcc8:mean', 'audio_naive:mfcc9:mean', 'audio_naive:mfcc10:mean', 'audio_naive:mfcc11:mean', 'audio_naive:mfcc12:mean', 'audio_naive:mfcc0:std', 'audio_naive:mfcc1:std', 'audio_naive:mfcc2:std', 'audio_naive:mfcc3:std', 'audio_naive:mfcc4:std', 'audio_naive:mfcc5:std', 'audio_naive:mfcc6:std', 'audio_naive:mfcc7:std', 'audio_naive:mfcc8:std', 'audio_naive:mfcc9:std', 'audio_naive:mfcc10:std', 'audio_naive:mfcc11:std', 'audio_naive:mfcc12:std', 'audio_properties:max_abs_value', 'audio_properties:normalization_multiplier', 'lf_measurements:light', 'lf_measurements:pressure', 'lf_measurements:proximity_cm', 'lf_measurements:proximity', 'lf_measurements:relative_humidity', 'lf_measurements:battery_level', 'lf_measurements:screen_brightness', 'lf_measurements:temperature_ambient', 'total_length'] WARNING:tensorflow:Layer lstm_30 will not use cuDNN kernels since it doesn't meet the criteria. It will use a generic GPU kernel as fallback when running on GPU. 4924/4924 [==============================] - 94s 19ms/step - loss: 0.2351 ['timestamp', 'discrete:app_state:is_active', 'discrete:app_state:is_inactive', 'discrete:app_state:is_background', 'discrete:app_state:missing', 'discrete:battery_plugged:is_ac', 'discrete:battery_plugged:is_usb', 'discrete:battery_plugged:is_wireless', 'discrete:battery_plugged:missing', 'discrete:battery_state:is_unknown', 'discrete:battery_state:is_unplugged', 'discrete:battery_state:is_not_charging', 'discrete:battery_state:is_discharging', 'discrete:battery_state:is_charging', 'discrete:battery_state:is_full', 'discrete:battery_state:missing', 'discrete:on_the_phone:is_False', 'discrete:on_the_phone:is_True', 'discrete:on_the_phone:missing', 'discrete:ringer_mode:is_normal', 'discrete:ringer_mode:is_silent_no_vibrate', 'discrete:ringer_mode:is_silent_with_vibrate', 'discrete:ringer_mode:missing', 'discrete:wifi_status:is_not_reachable', 'discrete:wifi_status:is_reachable_via_wifi', 'discrete:wifi_status:is_reachable_via_wwan', 'discrete:wifi_status:missing', 'discrete:time_of_day:between0and6', 'discrete:time_of_day:between3and9', 'discrete:time_of_day:between6and12', 'discrete:time_of_day:between9and15', 'discrete:time_of_day:between12and18', 'discrete:time_of_day:between15and21', 'discrete:time_of_day:between18and24', 'discrete:time_of_day:between21and3']
--------------------------------------------------------------------------- KeyError Traceback (most recent call last) Cell In[78], line 47 43 print(features_for_model[best_model_one_so_far]) 45 # Example of how to use the function 46 # Ensure 'model', 'features_to_include', and 'look_back' are defined as per your model's training setup ---> 47 predictions = predict_from_df(df_user.iloc[1:4], models_with_threshhold[best_model_one_so_far], features_to_include[best_model_one_so_far], look_back=3) 48 print(predictions) KeyError: 'threshold0_median_epoch1'
print(features_for_model[model_name])
df_user[features_for_model[model_name]].iloc[4].values
['timestamp', 'discrete:app_state:is_active', 'discrete:app_state:is_inactive', 'discrete:app_state:is_background', 'discrete:app_state:missing', 'discrete:battery_plugged:is_ac', 'discrete:battery_plugged:is_usb', 'discrete:battery_plugged:is_wireless', 'discrete:battery_plugged:missing', 'discrete:battery_state:is_unknown', 'discrete:battery_state:is_unplugged', 'discrete:battery_state:is_not_charging', 'discrete:battery_state:is_discharging', 'discrete:battery_state:is_charging', 'discrete:battery_state:is_full', 'discrete:battery_state:missing', 'discrete:on_the_phone:is_False', 'discrete:on_the_phone:is_True', 'discrete:on_the_phone:missing', 'discrete:ringer_mode:is_normal', 'discrete:ringer_mode:is_silent_no_vibrate', 'discrete:ringer_mode:is_silent_with_vibrate', 'discrete:ringer_mode:missing', 'discrete:wifi_status:is_not_reachable', 'discrete:wifi_status:is_reachable_via_wifi', 'discrete:wifi_status:is_reachable_via_wwan', 'discrete:wifi_status:missing', 'discrete:time_of_day:between0and6', 'discrete:time_of_day:between3and9', 'discrete:time_of_day:between6and12', 'discrete:time_of_day:between9and15', 'discrete:time_of_day:between12and18', 'discrete:time_of_day:between15and21', 'discrete:time_of_day:between18and24', 'discrete:time_of_day:between21and3']
array([1.44831694e+09, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
1.00000000e+00, 1.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00, 1.00000000e+00, 0.00000000e+00, 0.00000000e+00,
1.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00, 1.00000000e+00, 0.00000000e+00, 0.00000000e+00,
1.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00, 1.00000000e+00, 1.00000000e+00,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00])
models_with_threshhold = {}
features_to_include = {}
for i in range(6):
print("for threshold: ", i)
models_with_threshhold[i], features_to_include[i] = testing_threshold(i, -1)
for threshold: 0 192 out of 279 Columns to remove due to excessive missing data: ['raw_acc:magnitude_stats:mean', 'raw_acc:magnitude_stats:std', 'raw_acc:magnitude_stats:moment3', 'raw_acc:magnitude_stats:moment4', 'raw_acc:magnitude_stats:percentile25', 'raw_acc:magnitude_stats:percentile50', 'raw_acc:magnitude_stats:percentile75', 'raw_acc:magnitude_stats:value_entropy', 'raw_acc:magnitude_stats:time_entropy', 'raw_acc:magnitude_spectrum:log_energy_band0', 'raw_acc:magnitude_spectrum:log_energy_band1', 'raw_acc:magnitude_spectrum:log_energy_band2', 'raw_acc:magnitude_spectrum:log_energy_band3', 'raw_acc:magnitude_spectrum:log_energy_band4', 'raw_acc:magnitude_spectrum:spectral_entropy', 'raw_acc:magnitude_autocorrelation:period', 'raw_acc:magnitude_autocorrelation:normalized_ac', 'raw_acc:3d:mean_x', 'raw_acc:3d:mean_y', 'raw_acc:3d:mean_z', 'raw_acc:3d:std_x', 'raw_acc:3d:std_y', 'raw_acc:3d:std_z', 'raw_acc:3d:ro_xy', 'raw_acc:3d:ro_xz', 'raw_acc:3d:ro_yz', 'proc_gyro:magnitude_stats:mean', 'proc_gyro:magnitude_stats:std', 'proc_gyro:magnitude_stats:moment3', 'proc_gyro:magnitude_stats:moment4', 'proc_gyro:magnitude_stats:percentile25', 'proc_gyro:magnitude_stats:percentile50', 'proc_gyro:magnitude_stats:percentile75', 'proc_gyro:magnitude_stats:value_entropy', 'proc_gyro:magnitude_stats:time_entropy', 'proc_gyro:magnitude_spectrum:log_energy_band0', 'proc_gyro:magnitude_spectrum:log_energy_band1', 'proc_gyro:magnitude_spectrum:log_energy_band2', 'proc_gyro:magnitude_spectrum:log_energy_band3', 'proc_gyro:magnitude_spectrum:log_energy_band4', 'proc_gyro:magnitude_spectrum:spectral_entropy', 'proc_gyro:magnitude_autocorrelation:period', 'proc_gyro:magnitude_autocorrelation:normalized_ac', 'proc_gyro:3d:mean_x', 'proc_gyro:3d:mean_y', 'proc_gyro:3d:mean_z', 'proc_gyro:3d:std_x', 'proc_gyro:3d:std_y', 'proc_gyro:3d:std_z', 'proc_gyro:3d:ro_xy', 'proc_gyro:3d:ro_xz', 'proc_gyro:3d:ro_yz', 'raw_magnet:magnitude_stats:mean', 'raw_magnet:magnitude_stats:std', 'raw_magnet:magnitude_stats:moment3', 'raw_magnet:magnitude_stats:moment4', 'raw_magnet:magnitude_stats:percentile25', 'raw_magnet:magnitude_stats:percentile50', 'raw_magnet:magnitude_stats:percentile75', 'raw_magnet:magnitude_stats:value_entropy', 'raw_magnet:magnitude_stats:time_entropy', 'raw_magnet:magnitude_spectrum:log_energy_band0', 'raw_magnet:magnitude_spectrum:log_energy_band1', 'raw_magnet:magnitude_spectrum:log_energy_band2', 'raw_magnet:magnitude_spectrum:log_energy_band3', 'raw_magnet:magnitude_spectrum:log_energy_band4', 'raw_magnet:magnitude_spectrum:spectral_entropy', 'raw_magnet:magnitude_autocorrelation:period', 'raw_magnet:magnitude_autocorrelation:normalized_ac', 'raw_magnet:3d:mean_x', 'raw_magnet:3d:mean_y', 'raw_magnet:3d:mean_z', 'raw_magnet:3d:std_x', 'raw_magnet:3d:std_y', 'raw_magnet:3d:std_z', 'raw_magnet:3d:ro_xy', 'raw_magnet:3d:ro_xz', 'raw_magnet:3d:ro_yz', 'raw_magnet:avr_cosine_similarity_lag_range0', 'raw_magnet:avr_cosine_similarity_lag_range1', 'raw_magnet:avr_cosine_similarity_lag_range2', 'raw_magnet:avr_cosine_similarity_lag_range3', 'raw_magnet:avr_cosine_similarity_lag_range4', 'watch_acceleration:magnitude_stats:mean', 'watch_acceleration:magnitude_stats:std', 'watch_acceleration:magnitude_stats:moment3', 'watch_acceleration:magnitude_stats:moment4', 'watch_acceleration:magnitude_stats:percentile25', 'watch_acceleration:magnitude_stats:percentile50', 'watch_acceleration:magnitude_stats:percentile75', 'watch_acceleration:magnitude_stats:value_entropy', 'watch_acceleration:magnitude_stats:time_entropy', 'watch_acceleration:magnitude_spectrum:log_energy_band0', 'watch_acceleration:magnitude_spectrum:log_energy_band1', 'watch_acceleration:magnitude_spectrum:log_energy_band2', 'watch_acceleration:magnitude_spectrum:log_energy_band3', 'watch_acceleration:magnitude_spectrum:log_energy_band4', 'watch_acceleration:magnitude_spectrum:spectral_entropy', 'watch_acceleration:magnitude_autocorrelation:period', 'watch_acceleration:magnitude_autocorrelation:normalized_ac', 'watch_acceleration:3d:mean_x', 'watch_acceleration:3d:mean_y', 'watch_acceleration:3d:mean_z', 'watch_acceleration:3d:std_x', 'watch_acceleration:3d:std_y', 'watch_acceleration:3d:std_z', 'watch_acceleration:3d:ro_xy', 'watch_acceleration:3d:ro_xz', 'watch_acceleration:3d:ro_yz', 'watch_acceleration:spectrum:x_log_energy_band0', 'watch_acceleration:spectrum:x_log_energy_band1', 'watch_acceleration:spectrum:x_log_energy_band2', 'watch_acceleration:spectrum:x_log_energy_band3', 'watch_acceleration:spectrum:x_log_energy_band4', 'watch_acceleration:spectrum:y_log_energy_band0', 'watch_acceleration:spectrum:y_log_energy_band1', 'watch_acceleration:spectrum:y_log_energy_band2', 'watch_acceleration:spectrum:y_log_energy_band3', 'watch_acceleration:spectrum:y_log_energy_band4', 'watch_acceleration:spectrum:z_log_energy_band0', 'watch_acceleration:spectrum:z_log_energy_band1', 'watch_acceleration:spectrum:z_log_energy_band2', 'watch_acceleration:spectrum:z_log_energy_band3', 'watch_acceleration:spectrum:z_log_energy_band4', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range0', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range1', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range2', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range3', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range4', 'watch_heading:mean_cos', 'watch_heading:std_cos', 'watch_heading:mom3_cos', 'watch_heading:mom4_cos', 'watch_heading:mean_sin', 'watch_heading:std_sin', 'watch_heading:mom3_sin', 'watch_heading:mom4_sin', 'watch_heading:entropy_8bins', 'location:num_valid_updates', 'location:log_latitude_range', 'location:log_longitude_range', 'location:min_altitude', 'location:max_altitude', 'location:min_speed', 'location:max_speed', 'location:best_horizontal_accuracy', 'location:best_vertical_accuracy', 'location:diameter', 'location:log_diameter', 'location_quick_features:std_lat', 'location_quick_features:std_long', 'location_quick_features:lat_change', 'location_quick_features:long_change', 'location_quick_features:mean_abs_lat_deriv', 'location_quick_features:mean_abs_long_deriv', 'audio_naive:mfcc0:mean', 'audio_naive:mfcc1:mean', 'audio_naive:mfcc2:mean', 'audio_naive:mfcc3:mean', 'audio_naive:mfcc4:mean', 'audio_naive:mfcc5:mean', 'audio_naive:mfcc6:mean', 'audio_naive:mfcc7:mean', 'audio_naive:mfcc8:mean', 'audio_naive:mfcc9:mean', 'audio_naive:mfcc10:mean', 'audio_naive:mfcc11:mean', 'audio_naive:mfcc12:mean', 'audio_naive:mfcc0:std', 'audio_naive:mfcc1:std', 'audio_naive:mfcc2:std', 'audio_naive:mfcc3:std', 'audio_naive:mfcc4:std', 'audio_naive:mfcc5:std', 'audio_naive:mfcc6:std', 'audio_naive:mfcc7:std', 'audio_naive:mfcc8:std', 'audio_naive:mfcc9:std', 'audio_naive:mfcc10:std', 'audio_naive:mfcc11:std', 'audio_naive:mfcc12:std', 'audio_properties:max_abs_value', 'audio_properties:normalization_multiplier', 'lf_measurements:light', 'lf_measurements:pressure', 'lf_measurements:proximity_cm', 'lf_measurements:proximity', 'lf_measurements:relative_humidity', 'lf_measurements:battery_level', 'lf_measurements:screen_brightness', 'lf_measurements:temperature_ambient', 'total_length'] WARNING:tensorflow:Layer lstm_2 will not use cuDNN kernels since it doesn't meet the criteria. It will use a generic GPU kernel as fallback when running on GPU. 4924/4924 [==============================] - 95s 19ms/step - loss: 0.7676 for threshold: 1 165 out of 279 Columns to remove due to excessive missing data: ['proc_gyro:magnitude_stats:mean', 'proc_gyro:magnitude_stats:std', 'proc_gyro:magnitude_stats:moment3', 'proc_gyro:magnitude_stats:moment4', 'proc_gyro:magnitude_stats:percentile25', 'proc_gyro:magnitude_stats:percentile50', 'proc_gyro:magnitude_stats:percentile75', 'proc_gyro:magnitude_stats:value_entropy', 'proc_gyro:magnitude_stats:time_entropy', 'proc_gyro:magnitude_spectrum:log_energy_band0', 'proc_gyro:magnitude_spectrum:log_energy_band1', 'proc_gyro:magnitude_spectrum:log_energy_band2', 'proc_gyro:magnitude_spectrum:log_energy_band3', 'proc_gyro:magnitude_spectrum:log_energy_band4', 'proc_gyro:magnitude_spectrum:spectral_entropy', 'proc_gyro:magnitude_autocorrelation:period', 'proc_gyro:magnitude_autocorrelation:normalized_ac', 'proc_gyro:3d:mean_x', 'proc_gyro:3d:mean_y', 'proc_gyro:3d:mean_z', 'proc_gyro:3d:std_x', 'proc_gyro:3d:std_y', 'proc_gyro:3d:std_z', 'proc_gyro:3d:ro_xy', 'proc_gyro:3d:ro_xz', 'proc_gyro:3d:ro_yz', 'raw_magnet:magnitude_stats:mean', 'raw_magnet:magnitude_stats:std', 'raw_magnet:magnitude_stats:moment3', 'raw_magnet:magnitude_stats:moment4', 'raw_magnet:magnitude_stats:percentile25', 'raw_magnet:magnitude_stats:percentile50', 'raw_magnet:magnitude_stats:percentile75', 'raw_magnet:magnitude_stats:value_entropy', 'raw_magnet:magnitude_stats:time_entropy', 'raw_magnet:magnitude_spectrum:log_energy_band0', 'raw_magnet:magnitude_spectrum:log_energy_band1', 'raw_magnet:magnitude_spectrum:log_energy_band2', 'raw_magnet:magnitude_spectrum:log_energy_band3', 'raw_magnet:magnitude_spectrum:log_energy_band4', 'raw_magnet:magnitude_spectrum:spectral_entropy', 'raw_magnet:magnitude_autocorrelation:period', 'raw_magnet:magnitude_autocorrelation:normalized_ac', 'raw_magnet:3d:mean_x', 'raw_magnet:3d:mean_y', 'raw_magnet:3d:mean_z', 'raw_magnet:3d:std_x', 'raw_magnet:3d:std_y', 'raw_magnet:3d:std_z', 'raw_magnet:3d:ro_xy', 'raw_magnet:3d:ro_xz', 'raw_magnet:3d:ro_yz', 'raw_magnet:avr_cosine_similarity_lag_range0', 'raw_magnet:avr_cosine_similarity_lag_range1', 'raw_magnet:avr_cosine_similarity_lag_range2', 'raw_magnet:avr_cosine_similarity_lag_range3', 'raw_magnet:avr_cosine_similarity_lag_range4', 'watch_acceleration:magnitude_stats:mean', 'watch_acceleration:magnitude_stats:std', 'watch_acceleration:magnitude_stats:moment3', 'watch_acceleration:magnitude_stats:moment4', 'watch_acceleration:magnitude_stats:percentile25', 'watch_acceleration:magnitude_stats:percentile50', 'watch_acceleration:magnitude_stats:percentile75', 'watch_acceleration:magnitude_stats:value_entropy', 'watch_acceleration:magnitude_stats:time_entropy', 'watch_acceleration:magnitude_spectrum:log_energy_band0', 'watch_acceleration:magnitude_spectrum:log_energy_band1', 'watch_acceleration:magnitude_spectrum:log_energy_band2', 'watch_acceleration:magnitude_spectrum:log_energy_band3', 'watch_acceleration:magnitude_spectrum:log_energy_band4', 'watch_acceleration:magnitude_spectrum:spectral_entropy', 'watch_acceleration:magnitude_autocorrelation:period', 'watch_acceleration:magnitude_autocorrelation:normalized_ac', 'watch_acceleration:3d:mean_x', 'watch_acceleration:3d:mean_y', 'watch_acceleration:3d:mean_z', 'watch_acceleration:3d:std_x', 'watch_acceleration:3d:std_y', 'watch_acceleration:3d:std_z', 'watch_acceleration:3d:ro_xy', 'watch_acceleration:3d:ro_xz', 'watch_acceleration:3d:ro_yz', 'watch_acceleration:spectrum:x_log_energy_band0', 'watch_acceleration:spectrum:x_log_energy_band1', 'watch_acceleration:spectrum:x_log_energy_band2', 'watch_acceleration:spectrum:x_log_energy_band3', 'watch_acceleration:spectrum:x_log_energy_band4', 'watch_acceleration:spectrum:y_log_energy_band0', 'watch_acceleration:spectrum:y_log_energy_band1', 'watch_acceleration:spectrum:y_log_energy_band2', 'watch_acceleration:spectrum:y_log_energy_band3', 'watch_acceleration:spectrum:y_log_energy_band4', 'watch_acceleration:spectrum:z_log_energy_band0', 'watch_acceleration:spectrum:z_log_energy_band1', 'watch_acceleration:spectrum:z_log_energy_band2', 'watch_acceleration:spectrum:z_log_energy_band3', 'watch_acceleration:spectrum:z_log_energy_band4', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range0', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range1', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range2', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range3', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range4', 'watch_heading:mean_cos', 'watch_heading:std_cos', 'watch_heading:mom3_cos', 'watch_heading:mom4_cos', 'watch_heading:mean_sin', 'watch_heading:std_sin', 'watch_heading:mom3_sin', 'watch_heading:mom4_sin', 'watch_heading:entropy_8bins', 'location:num_valid_updates', 'location:log_latitude_range', 'location:log_longitude_range', 'location:min_altitude', 'location:max_altitude', 'location:min_speed', 'location:max_speed', 'location:best_horizontal_accuracy', 'location:best_vertical_accuracy', 'location:diameter', 'location:log_diameter', 'location_quick_features:std_lat', 'location_quick_features:std_long', 'location_quick_features:lat_change', 'location_quick_features:long_change', 'location_quick_features:mean_abs_lat_deriv', 'location_quick_features:mean_abs_long_deriv', 'audio_naive:mfcc0:mean', 'audio_naive:mfcc1:mean', 'audio_naive:mfcc2:mean', 'audio_naive:mfcc3:mean', 'audio_naive:mfcc4:mean', 'audio_naive:mfcc5:mean', 'audio_naive:mfcc6:mean', 'audio_naive:mfcc7:mean', 'audio_naive:mfcc8:mean', 'audio_naive:mfcc9:mean', 'audio_naive:mfcc10:mean', 'audio_naive:mfcc11:mean', 'audio_naive:mfcc12:mean', 'audio_naive:mfcc0:std', 'audio_naive:mfcc1:std', 'audio_naive:mfcc2:std', 'audio_naive:mfcc3:std', 'audio_naive:mfcc4:std', 'audio_naive:mfcc5:std', 'audio_naive:mfcc6:std', 'audio_naive:mfcc7:std', 'audio_naive:mfcc8:std', 'audio_naive:mfcc9:std', 'audio_naive:mfcc10:std', 'audio_naive:mfcc11:std', 'audio_naive:mfcc12:std', 'audio_properties:max_abs_value', 'audio_properties:normalization_multiplier', 'lf_measurements:light', 'lf_measurements:pressure', 'lf_measurements:proximity_cm', 'lf_measurements:proximity', 'lf_measurements:relative_humidity', 'lf_measurements:screen_brightness', 'lf_measurements:temperature_ambient', 'total_length'] WARNING:tensorflow:Layer lstm_3 will not use cuDNN kernels since it doesn't meet the criteria. It will use a generic GPU kernel as fallback when running on GPU. 4924/4924 [==============================] - 95s 19ms/step - loss: 0.9039 for threshold: 2 163 out of 279 Columns to remove due to excessive missing data: ['proc_gyro:magnitude_stats:mean', 'proc_gyro:magnitude_stats:std', 'proc_gyro:magnitude_stats:moment3', 'proc_gyro:magnitude_stats:moment4', 'proc_gyro:magnitude_stats:percentile25', 'proc_gyro:magnitude_stats:percentile50', 'proc_gyro:magnitude_stats:percentile75', 'proc_gyro:magnitude_stats:value_entropy', 'proc_gyro:magnitude_stats:time_entropy', 'proc_gyro:magnitude_spectrum:log_energy_band0', 'proc_gyro:magnitude_spectrum:log_energy_band1', 'proc_gyro:magnitude_spectrum:log_energy_band2', 'proc_gyro:magnitude_spectrum:log_energy_band3', 'proc_gyro:magnitude_spectrum:log_energy_band4', 'proc_gyro:magnitude_spectrum:spectral_entropy', 'proc_gyro:magnitude_autocorrelation:period', 'proc_gyro:magnitude_autocorrelation:normalized_ac', 'proc_gyro:3d:mean_x', 'proc_gyro:3d:mean_y', 'proc_gyro:3d:mean_z', 'proc_gyro:3d:std_x', 'proc_gyro:3d:std_y', 'proc_gyro:3d:std_z', 'proc_gyro:3d:ro_xy', 'proc_gyro:3d:ro_xz', 'proc_gyro:3d:ro_yz', 'raw_magnet:magnitude_stats:mean', 'raw_magnet:magnitude_stats:std', 'raw_magnet:magnitude_stats:moment3', 'raw_magnet:magnitude_stats:moment4', 'raw_magnet:magnitude_stats:percentile25', 'raw_magnet:magnitude_stats:percentile50', 'raw_magnet:magnitude_stats:percentile75', 'raw_magnet:magnitude_stats:value_entropy', 'raw_magnet:magnitude_stats:time_entropy', 'raw_magnet:magnitude_spectrum:log_energy_band0', 'raw_magnet:magnitude_spectrum:log_energy_band1', 'raw_magnet:magnitude_spectrum:log_energy_band2', 'raw_magnet:magnitude_spectrum:log_energy_band3', 'raw_magnet:magnitude_spectrum:log_energy_band4', 'raw_magnet:magnitude_spectrum:spectral_entropy', 'raw_magnet:magnitude_autocorrelation:period', 'raw_magnet:magnitude_autocorrelation:normalized_ac', 'raw_magnet:3d:mean_x', 'raw_magnet:3d:mean_y', 'raw_magnet:3d:mean_z', 'raw_magnet:3d:std_x', 'raw_magnet:3d:std_y', 'raw_magnet:3d:std_z', 'raw_magnet:3d:ro_xy', 'raw_magnet:3d:ro_xz', 'raw_magnet:3d:ro_yz', 'raw_magnet:avr_cosine_similarity_lag_range0', 'raw_magnet:avr_cosine_similarity_lag_range1', 'raw_magnet:avr_cosine_similarity_lag_range2', 'raw_magnet:avr_cosine_similarity_lag_range3', 'raw_magnet:avr_cosine_similarity_lag_range4', 'watch_acceleration:magnitude_stats:mean', 'watch_acceleration:magnitude_stats:std', 'watch_acceleration:magnitude_stats:moment3', 'watch_acceleration:magnitude_stats:moment4', 'watch_acceleration:magnitude_stats:percentile25', 'watch_acceleration:magnitude_stats:percentile50', 'watch_acceleration:magnitude_stats:percentile75', 'watch_acceleration:magnitude_stats:value_entropy', 'watch_acceleration:magnitude_stats:time_entropy', 'watch_acceleration:magnitude_spectrum:log_energy_band0', 'watch_acceleration:magnitude_spectrum:log_energy_band1', 'watch_acceleration:magnitude_spectrum:log_energy_band2', 'watch_acceleration:magnitude_spectrum:log_energy_band3', 'watch_acceleration:magnitude_spectrum:log_energy_band4', 'watch_acceleration:magnitude_spectrum:spectral_entropy', 'watch_acceleration:magnitude_autocorrelation:period', 'watch_acceleration:magnitude_autocorrelation:normalized_ac', 'watch_acceleration:3d:mean_x', 'watch_acceleration:3d:mean_y', 'watch_acceleration:3d:mean_z', 'watch_acceleration:3d:std_x', 'watch_acceleration:3d:std_y', 'watch_acceleration:3d:std_z', 'watch_acceleration:3d:ro_xy', 'watch_acceleration:3d:ro_xz', 'watch_acceleration:3d:ro_yz', 'watch_acceleration:spectrum:x_log_energy_band0', 'watch_acceleration:spectrum:x_log_energy_band1', 'watch_acceleration:spectrum:x_log_energy_band2', 'watch_acceleration:spectrum:x_log_energy_band3', 'watch_acceleration:spectrum:x_log_energy_band4', 'watch_acceleration:spectrum:y_log_energy_band0', 'watch_acceleration:spectrum:y_log_energy_band1', 'watch_acceleration:spectrum:y_log_energy_band2', 'watch_acceleration:spectrum:y_log_energy_band3', 'watch_acceleration:spectrum:y_log_energy_band4', 'watch_acceleration:spectrum:z_log_energy_band0', 'watch_acceleration:spectrum:z_log_energy_band1', 'watch_acceleration:spectrum:z_log_energy_band2', 'watch_acceleration:spectrum:z_log_energy_band3', 'watch_acceleration:spectrum:z_log_energy_band4', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range0', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range1', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range2', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range3', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range4', 'watch_heading:mean_cos', 'watch_heading:std_cos', 'watch_heading:mom3_cos', 'watch_heading:mom4_cos', 'watch_heading:mean_sin', 'watch_heading:std_sin', 'watch_heading:mom3_sin', 'watch_heading:mom4_sin', 'watch_heading:entropy_8bins', 'location:num_valid_updates', 'location:log_latitude_range', 'location:log_longitude_range', 'location:min_altitude', 'location:max_altitude', 'location:min_speed', 'location:max_speed', 'location:best_horizontal_accuracy', 'location:best_vertical_accuracy', 'location:diameter', 'location:log_diameter', 'location_quick_features:std_lat', 'location_quick_features:std_long', 'location_quick_features:lat_change', 'location_quick_features:long_change', 'location_quick_features:mean_abs_lat_deriv', 'location_quick_features:mean_abs_long_deriv', 'audio_naive:mfcc0:mean', 'audio_naive:mfcc1:mean', 'audio_naive:mfcc2:mean', 'audio_naive:mfcc3:mean', 'audio_naive:mfcc4:mean', 'audio_naive:mfcc5:mean', 'audio_naive:mfcc6:mean', 'audio_naive:mfcc7:mean', 'audio_naive:mfcc8:mean', 'audio_naive:mfcc9:mean', 'audio_naive:mfcc10:mean', 'audio_naive:mfcc11:mean', 'audio_naive:mfcc12:mean', 'audio_naive:mfcc0:std', 'audio_naive:mfcc1:std', 'audio_naive:mfcc2:std', 'audio_naive:mfcc3:std', 'audio_naive:mfcc4:std', 'audio_naive:mfcc5:std', 'audio_naive:mfcc6:std', 'audio_naive:mfcc7:std', 'audio_naive:mfcc8:std', 'audio_naive:mfcc9:std', 'audio_naive:mfcc10:std', 'audio_naive:mfcc11:std', 'audio_naive:mfcc12:std', 'lf_measurements:light', 'lf_measurements:pressure', 'lf_measurements:proximity_cm', 'lf_measurements:proximity', 'lf_measurements:relative_humidity', 'lf_measurements:screen_brightness', 'lf_measurements:temperature_ambient', 'total_length'] WARNING:tensorflow:Layer lstm_4 will not use cuDNN kernels since it doesn't meet the criteria. It will use a generic GPU kernel as fallback when running on GPU. 4924/4924 [==============================] - 95s 19ms/step - loss: 0.9191 for threshold: 3 137 out of 279 Columns to remove due to excessive missing data: ['proc_gyro:magnitude_stats:mean', 'proc_gyro:magnitude_stats:std', 'proc_gyro:magnitude_stats:moment3', 'proc_gyro:magnitude_stats:moment4', 'proc_gyro:magnitude_stats:percentile25', 'proc_gyro:magnitude_stats:percentile50', 'proc_gyro:magnitude_stats:percentile75', 'proc_gyro:magnitude_stats:value_entropy', 'proc_gyro:magnitude_stats:time_entropy', 'proc_gyro:magnitude_spectrum:log_energy_band0', 'proc_gyro:magnitude_spectrum:log_energy_band1', 'proc_gyro:magnitude_spectrum:log_energy_band2', 'proc_gyro:magnitude_spectrum:log_energy_band3', 'proc_gyro:magnitude_spectrum:log_energy_band4', 'proc_gyro:magnitude_spectrum:spectral_entropy', 'proc_gyro:magnitude_autocorrelation:period', 'proc_gyro:magnitude_autocorrelation:normalized_ac', 'proc_gyro:3d:mean_x', 'proc_gyro:3d:mean_y', 'proc_gyro:3d:mean_z', 'proc_gyro:3d:std_x', 'proc_gyro:3d:std_y', 'proc_gyro:3d:std_z', 'proc_gyro:3d:ro_xy', 'proc_gyro:3d:ro_xz', 'proc_gyro:3d:ro_yz', 'raw_magnet:magnitude_stats:mean', 'raw_magnet:magnitude_stats:std', 'raw_magnet:magnitude_stats:moment3', 'raw_magnet:magnitude_stats:moment4', 'raw_magnet:magnitude_stats:percentile25', 'raw_magnet:magnitude_stats:percentile50', 'raw_magnet:magnitude_stats:percentile75', 'raw_magnet:magnitude_stats:value_entropy', 'raw_magnet:magnitude_stats:time_entropy', 'raw_magnet:magnitude_spectrum:log_energy_band0', 'raw_magnet:magnitude_spectrum:log_energy_band1', 'raw_magnet:magnitude_spectrum:log_energy_band2', 'raw_magnet:magnitude_spectrum:log_energy_band3', 'raw_magnet:magnitude_spectrum:log_energy_band4', 'raw_magnet:magnitude_spectrum:spectral_entropy', 'raw_magnet:magnitude_autocorrelation:period', 'raw_magnet:magnitude_autocorrelation:normalized_ac', 'raw_magnet:3d:mean_x', 'raw_magnet:3d:mean_y', 'raw_magnet:3d:mean_z', 'raw_magnet:3d:std_x', 'raw_magnet:3d:std_y', 'raw_magnet:3d:std_z', 'raw_magnet:3d:ro_xy', 'raw_magnet:3d:ro_xz', 'raw_magnet:3d:ro_yz', 'raw_magnet:avr_cosine_similarity_lag_range0', 'raw_magnet:avr_cosine_similarity_lag_range1', 'raw_magnet:avr_cosine_similarity_lag_range2', 'raw_magnet:avr_cosine_similarity_lag_range3', 'raw_magnet:avr_cosine_similarity_lag_range4', 'watch_acceleration:magnitude_stats:mean', 'watch_acceleration:magnitude_stats:std', 'watch_acceleration:magnitude_stats:moment3', 'watch_acceleration:magnitude_stats:moment4', 'watch_acceleration:magnitude_stats:percentile25', 'watch_acceleration:magnitude_stats:percentile50', 'watch_acceleration:magnitude_stats:percentile75', 'watch_acceleration:magnitude_stats:value_entropy', 'watch_acceleration:magnitude_stats:time_entropy', 'watch_acceleration:magnitude_spectrum:log_energy_band0', 'watch_acceleration:magnitude_spectrum:log_energy_band1', 'watch_acceleration:magnitude_spectrum:log_energy_band2', 'watch_acceleration:magnitude_spectrum:log_energy_band3', 'watch_acceleration:magnitude_spectrum:log_energy_band4', 'watch_acceleration:magnitude_spectrum:spectral_entropy', 'watch_acceleration:magnitude_autocorrelation:period', 'watch_acceleration:magnitude_autocorrelation:normalized_ac', 'watch_acceleration:3d:mean_x', 'watch_acceleration:3d:mean_y', 'watch_acceleration:3d:mean_z', 'watch_acceleration:3d:std_x', 'watch_acceleration:3d:std_y', 'watch_acceleration:3d:std_z', 'watch_acceleration:3d:ro_xy', 'watch_acceleration:3d:ro_xz', 'watch_acceleration:3d:ro_yz', 'watch_acceleration:spectrum:x_log_energy_band0', 'watch_acceleration:spectrum:x_log_energy_band1', 'watch_acceleration:spectrum:x_log_energy_band2', 'watch_acceleration:spectrum:x_log_energy_band3', 'watch_acceleration:spectrum:x_log_energy_band4', 'watch_acceleration:spectrum:y_log_energy_band0', 'watch_acceleration:spectrum:y_log_energy_band1', 'watch_acceleration:spectrum:y_log_energy_band2', 'watch_acceleration:spectrum:y_log_energy_band3', 'watch_acceleration:spectrum:y_log_energy_band4', 'watch_acceleration:spectrum:z_log_energy_band0', 'watch_acceleration:spectrum:z_log_energy_band1', 'watch_acceleration:spectrum:z_log_energy_band2', 'watch_acceleration:spectrum:z_log_energy_band3', 'watch_acceleration:spectrum:z_log_energy_band4', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range0', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range1', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range2', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range3', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range4', 'watch_heading:mean_cos', 'watch_heading:std_cos', 'watch_heading:mom3_cos', 'watch_heading:mom4_cos', 'watch_heading:mean_sin', 'watch_heading:std_sin', 'watch_heading:mom3_sin', 'watch_heading:mom4_sin', 'watch_heading:entropy_8bins', 'location:num_valid_updates', 'location:log_latitude_range', 'location:log_longitude_range', 'location:min_altitude', 'location:max_altitude', 'location:min_speed', 'location:max_speed', 'location:best_horizontal_accuracy', 'location:best_vertical_accuracy', 'location:diameter', 'location:log_diameter', 'location_quick_features:std_lat', 'location_quick_features:std_long', 'location_quick_features:lat_change', 'location_quick_features:long_change', 'location_quick_features:mean_abs_lat_deriv', 'location_quick_features:mean_abs_long_deriv', 'lf_measurements:light', 'lf_measurements:pressure', 'lf_measurements:proximity_cm', 'lf_measurements:proximity', 'lf_measurements:relative_humidity', 'lf_measurements:screen_brightness', 'lf_measurements:temperature_ambient', 'total_length'] WARNING:tensorflow:Layer lstm_5 will not use cuDNN kernels since it doesn't meet the criteria. It will use a generic GPU kernel as fallback when running on GPU. 4924/4924 [==============================] - 97s 20ms/step - loss: 0.9159 for threshold: 4 137 out of 279 Columns to remove due to excessive missing data: ['proc_gyro:magnitude_stats:mean', 'proc_gyro:magnitude_stats:std', 'proc_gyro:magnitude_stats:moment3', 'proc_gyro:magnitude_stats:moment4', 'proc_gyro:magnitude_stats:percentile25', 'proc_gyro:magnitude_stats:percentile50', 'proc_gyro:magnitude_stats:percentile75', 'proc_gyro:magnitude_stats:value_entropy', 'proc_gyro:magnitude_stats:time_entropy', 'proc_gyro:magnitude_spectrum:log_energy_band0', 'proc_gyro:magnitude_spectrum:log_energy_band1', 'proc_gyro:magnitude_spectrum:log_energy_band2', 'proc_gyro:magnitude_spectrum:log_energy_band3', 'proc_gyro:magnitude_spectrum:log_energy_band4', 'proc_gyro:magnitude_spectrum:spectral_entropy', 'proc_gyro:magnitude_autocorrelation:period', 'proc_gyro:magnitude_autocorrelation:normalized_ac', 'proc_gyro:3d:mean_x', 'proc_gyro:3d:mean_y', 'proc_gyro:3d:mean_z', 'proc_gyro:3d:std_x', 'proc_gyro:3d:std_y', 'proc_gyro:3d:std_z', 'proc_gyro:3d:ro_xy', 'proc_gyro:3d:ro_xz', 'proc_gyro:3d:ro_yz', 'raw_magnet:magnitude_stats:mean', 'raw_magnet:magnitude_stats:std', 'raw_magnet:magnitude_stats:moment3', 'raw_magnet:magnitude_stats:moment4', 'raw_magnet:magnitude_stats:percentile25', 'raw_magnet:magnitude_stats:percentile50', 'raw_magnet:magnitude_stats:percentile75', 'raw_magnet:magnitude_stats:value_entropy', 'raw_magnet:magnitude_stats:time_entropy', 'raw_magnet:magnitude_spectrum:log_energy_band0', 'raw_magnet:magnitude_spectrum:log_energy_band1', 'raw_magnet:magnitude_spectrum:log_energy_band2', 'raw_magnet:magnitude_spectrum:log_energy_band3', 'raw_magnet:magnitude_spectrum:log_energy_band4', 'raw_magnet:magnitude_spectrum:spectral_entropy', 'raw_magnet:magnitude_autocorrelation:period', 'raw_magnet:magnitude_autocorrelation:normalized_ac', 'raw_magnet:3d:mean_x', 'raw_magnet:3d:mean_y', 'raw_magnet:3d:mean_z', 'raw_magnet:3d:std_x', 'raw_magnet:3d:std_y', 'raw_magnet:3d:std_z', 'raw_magnet:3d:ro_xy', 'raw_magnet:3d:ro_xz', 'raw_magnet:3d:ro_yz', 'raw_magnet:avr_cosine_similarity_lag_range0', 'raw_magnet:avr_cosine_similarity_lag_range1', 'raw_magnet:avr_cosine_similarity_lag_range2', 'raw_magnet:avr_cosine_similarity_lag_range3', 'raw_magnet:avr_cosine_similarity_lag_range4', 'watch_acceleration:magnitude_stats:mean', 'watch_acceleration:magnitude_stats:std', 'watch_acceleration:magnitude_stats:moment3', 'watch_acceleration:magnitude_stats:moment4', 'watch_acceleration:magnitude_stats:percentile25', 'watch_acceleration:magnitude_stats:percentile50', 'watch_acceleration:magnitude_stats:percentile75', 'watch_acceleration:magnitude_stats:value_entropy', 'watch_acceleration:magnitude_stats:time_entropy', 'watch_acceleration:magnitude_spectrum:log_energy_band0', 'watch_acceleration:magnitude_spectrum:log_energy_band1', 'watch_acceleration:magnitude_spectrum:log_energy_band2', 'watch_acceleration:magnitude_spectrum:log_energy_band3', 'watch_acceleration:magnitude_spectrum:log_energy_band4', 'watch_acceleration:magnitude_spectrum:spectral_entropy', 'watch_acceleration:magnitude_autocorrelation:period', 'watch_acceleration:magnitude_autocorrelation:normalized_ac', 'watch_acceleration:3d:mean_x', 'watch_acceleration:3d:mean_y', 'watch_acceleration:3d:mean_z', 'watch_acceleration:3d:std_x', 'watch_acceleration:3d:std_y', 'watch_acceleration:3d:std_z', 'watch_acceleration:3d:ro_xy', 'watch_acceleration:3d:ro_xz', 'watch_acceleration:3d:ro_yz', 'watch_acceleration:spectrum:x_log_energy_band0', 'watch_acceleration:spectrum:x_log_energy_band1', 'watch_acceleration:spectrum:x_log_energy_band2', 'watch_acceleration:spectrum:x_log_energy_band3', 'watch_acceleration:spectrum:x_log_energy_band4', 'watch_acceleration:spectrum:y_log_energy_band0', 'watch_acceleration:spectrum:y_log_energy_band1', 'watch_acceleration:spectrum:y_log_energy_band2', 'watch_acceleration:spectrum:y_log_energy_band3', 'watch_acceleration:spectrum:y_log_energy_band4', 'watch_acceleration:spectrum:z_log_energy_band0', 'watch_acceleration:spectrum:z_log_energy_band1', 'watch_acceleration:spectrum:z_log_energy_band2', 'watch_acceleration:spectrum:z_log_energy_band3', 'watch_acceleration:spectrum:z_log_energy_band4', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range0', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range1', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range2', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range3', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range4', 'watch_heading:mean_cos', 'watch_heading:std_cos', 'watch_heading:mom3_cos', 'watch_heading:mom4_cos', 'watch_heading:mean_sin', 'watch_heading:std_sin', 'watch_heading:mom3_sin', 'watch_heading:mom4_sin', 'watch_heading:entropy_8bins', 'location:num_valid_updates', 'location:log_latitude_range', 'location:log_longitude_range', 'location:min_altitude', 'location:max_altitude', 'location:min_speed', 'location:max_speed', 'location:best_horizontal_accuracy', 'location:best_vertical_accuracy', 'location:diameter', 'location:log_diameter', 'location_quick_features:std_lat', 'location_quick_features:std_long', 'location_quick_features:lat_change', 'location_quick_features:long_change', 'location_quick_features:mean_abs_lat_deriv', 'location_quick_features:mean_abs_long_deriv', 'lf_measurements:light', 'lf_measurements:pressure', 'lf_measurements:proximity_cm', 'lf_measurements:proximity', 'lf_measurements:relative_humidity', 'lf_measurements:screen_brightness', 'lf_measurements:temperature_ambient', 'total_length'] WARNING:tensorflow:Layer lstm_6 will not use cuDNN kernels since it doesn't meet the criteria. It will use a generic GPU kernel as fallback when running on GPU. 4924/4924 [==============================] - 97s 20ms/step - loss: 0.9998 for threshold: 5 114 out of 279 Columns to remove due to excessive missing data: ['proc_gyro:3d:ro_xy', 'proc_gyro:3d:ro_xz', 'proc_gyro:3d:ro_yz', 'raw_magnet:magnitude_stats:mean', 'raw_magnet:magnitude_stats:std', 'raw_magnet:magnitude_stats:moment3', 'raw_magnet:magnitude_stats:moment4', 'raw_magnet:magnitude_stats:percentile25', 'raw_magnet:magnitude_stats:percentile50', 'raw_magnet:magnitude_stats:percentile75', 'raw_magnet:magnitude_stats:value_entropy', 'raw_magnet:magnitude_stats:time_entropy', 'raw_magnet:magnitude_spectrum:log_energy_band0', 'raw_magnet:magnitude_spectrum:log_energy_band1', 'raw_magnet:magnitude_spectrum:log_energy_band2', 'raw_magnet:magnitude_spectrum:log_energy_band3', 'raw_magnet:magnitude_spectrum:log_energy_band4', 'raw_magnet:magnitude_spectrum:spectral_entropy', 'raw_magnet:magnitude_autocorrelation:period', 'raw_magnet:magnitude_autocorrelation:normalized_ac', 'raw_magnet:3d:mean_x', 'raw_magnet:3d:mean_y', 'raw_magnet:3d:mean_z', 'raw_magnet:3d:std_x', 'raw_magnet:3d:std_y', 'raw_magnet:3d:std_z', 'raw_magnet:3d:ro_xy', 'raw_magnet:3d:ro_xz', 'raw_magnet:3d:ro_yz', 'raw_magnet:avr_cosine_similarity_lag_range0', 'raw_magnet:avr_cosine_similarity_lag_range1', 'raw_magnet:avr_cosine_similarity_lag_range2', 'raw_magnet:avr_cosine_similarity_lag_range3', 'raw_magnet:avr_cosine_similarity_lag_range4', 'watch_acceleration:magnitude_stats:mean', 'watch_acceleration:magnitude_stats:std', 'watch_acceleration:magnitude_stats:moment3', 'watch_acceleration:magnitude_stats:moment4', 'watch_acceleration:magnitude_stats:percentile25', 'watch_acceleration:magnitude_stats:percentile50', 'watch_acceleration:magnitude_stats:percentile75', 'watch_acceleration:magnitude_stats:value_entropy', 'watch_acceleration:magnitude_stats:time_entropy', 'watch_acceleration:magnitude_spectrum:log_energy_band0', 'watch_acceleration:magnitude_spectrum:log_energy_band1', 'watch_acceleration:magnitude_spectrum:log_energy_band2', 'watch_acceleration:magnitude_spectrum:log_energy_band3', 'watch_acceleration:magnitude_spectrum:log_energy_band4', 'watch_acceleration:magnitude_spectrum:spectral_entropy', 'watch_acceleration:magnitude_autocorrelation:period', 'watch_acceleration:magnitude_autocorrelation:normalized_ac', 'watch_acceleration:3d:mean_x', 'watch_acceleration:3d:mean_y', 'watch_acceleration:3d:mean_z', 'watch_acceleration:3d:std_x', 'watch_acceleration:3d:std_y', 'watch_acceleration:3d:std_z', 'watch_acceleration:3d:ro_xy', 'watch_acceleration:3d:ro_xz', 'watch_acceleration:3d:ro_yz', 'watch_acceleration:spectrum:x_log_energy_band0', 'watch_acceleration:spectrum:x_log_energy_band1', 'watch_acceleration:spectrum:x_log_energy_band2', 'watch_acceleration:spectrum:x_log_energy_band3', 'watch_acceleration:spectrum:x_log_energy_band4', 'watch_acceleration:spectrum:y_log_energy_band0', 'watch_acceleration:spectrum:y_log_energy_band1', 'watch_acceleration:spectrum:y_log_energy_band2', 'watch_acceleration:spectrum:y_log_energy_band3', 'watch_acceleration:spectrum:y_log_energy_band4', 'watch_acceleration:spectrum:z_log_energy_band0', 'watch_acceleration:spectrum:z_log_energy_band1', 'watch_acceleration:spectrum:z_log_energy_band2', 'watch_acceleration:spectrum:z_log_energy_band3', 'watch_acceleration:spectrum:z_log_energy_band4', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range0', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range1', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range2', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range3', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range4', 'watch_heading:mean_cos', 'watch_heading:std_cos', 'watch_heading:mom3_cos', 'watch_heading:mom4_cos', 'watch_heading:mean_sin', 'watch_heading:std_sin', 'watch_heading:mom3_sin', 'watch_heading:mom4_sin', 'watch_heading:entropy_8bins', 'location:num_valid_updates', 'location:log_latitude_range', 'location:log_longitude_range', 'location:min_altitude', 'location:max_altitude', 'location:min_speed', 'location:max_speed', 'location:best_horizontal_accuracy', 'location:best_vertical_accuracy', 'location:diameter', 'location:log_diameter', 'location_quick_features:std_lat', 'location_quick_features:std_long', 'location_quick_features:lat_change', 'location_quick_features:long_change', 'location_quick_features:mean_abs_lat_deriv', 'location_quick_features:mean_abs_long_deriv', 'lf_measurements:light', 'lf_measurements:pressure', 'lf_measurements:proximity_cm', 'lf_measurements:proximity', 'lf_measurements:relative_humidity', 'lf_measurements:screen_brightness', 'lf_measurements:temperature_ambient', 'total_length'] WARNING:tensorflow:Layer lstm_7 will not use cuDNN kernels since it doesn't meet the criteria. It will use a generic GPU kernel as fallback when running on GPU. 4924/4924 [==============================] - 96s 19ms/step - loss: 1.0113
fill_options = ['mean', 'median', 'zero']
for fill in fill_options:
print("Checking different fills at 5 threshhold with ", fill, ' fill option')
models_with_threshhold[fill], features_to_include[fill] = testing_threshold(5, -1, fill_type = fill)
Checking different fills at 5 threshhold with mean fill option 114 out of 279 Columns to remove due to excessive missing data: ['proc_gyro:3d:ro_xy', 'proc_gyro:3d:ro_xz', 'proc_gyro:3d:ro_yz', 'raw_magnet:magnitude_stats:mean', 'raw_magnet:magnitude_stats:std', 'raw_magnet:magnitude_stats:moment3', 'raw_magnet:magnitude_stats:moment4', 'raw_magnet:magnitude_stats:percentile25', 'raw_magnet:magnitude_stats:percentile50', 'raw_magnet:magnitude_stats:percentile75', 'raw_magnet:magnitude_stats:value_entropy', 'raw_magnet:magnitude_stats:time_entropy', 'raw_magnet:magnitude_spectrum:log_energy_band0', 'raw_magnet:magnitude_spectrum:log_energy_band1', 'raw_magnet:magnitude_spectrum:log_energy_band2', 'raw_magnet:magnitude_spectrum:log_energy_band3', 'raw_magnet:magnitude_spectrum:log_energy_band4', 'raw_magnet:magnitude_spectrum:spectral_entropy', 'raw_magnet:magnitude_autocorrelation:period', 'raw_magnet:magnitude_autocorrelation:normalized_ac', 'raw_magnet:3d:mean_x', 'raw_magnet:3d:mean_y', 'raw_magnet:3d:mean_z', 'raw_magnet:3d:std_x', 'raw_magnet:3d:std_y', 'raw_magnet:3d:std_z', 'raw_magnet:3d:ro_xy', 'raw_magnet:3d:ro_xz', 'raw_magnet:3d:ro_yz', 'raw_magnet:avr_cosine_similarity_lag_range0', 'raw_magnet:avr_cosine_similarity_lag_range1', 'raw_magnet:avr_cosine_similarity_lag_range2', 'raw_magnet:avr_cosine_similarity_lag_range3', 'raw_magnet:avr_cosine_similarity_lag_range4', 'watch_acceleration:magnitude_stats:mean', 'watch_acceleration:magnitude_stats:std', 'watch_acceleration:magnitude_stats:moment3', 'watch_acceleration:magnitude_stats:moment4', 'watch_acceleration:magnitude_stats:percentile25', 'watch_acceleration:magnitude_stats:percentile50', 'watch_acceleration:magnitude_stats:percentile75', 'watch_acceleration:magnitude_stats:value_entropy', 'watch_acceleration:magnitude_stats:time_entropy', 'watch_acceleration:magnitude_spectrum:log_energy_band0', 'watch_acceleration:magnitude_spectrum:log_energy_band1', 'watch_acceleration:magnitude_spectrum:log_energy_band2', 'watch_acceleration:magnitude_spectrum:log_energy_band3', 'watch_acceleration:magnitude_spectrum:log_energy_band4', 'watch_acceleration:magnitude_spectrum:spectral_entropy', 'watch_acceleration:magnitude_autocorrelation:period', 'watch_acceleration:magnitude_autocorrelation:normalized_ac', 'watch_acceleration:3d:mean_x', 'watch_acceleration:3d:mean_y', 'watch_acceleration:3d:mean_z', 'watch_acceleration:3d:std_x', 'watch_acceleration:3d:std_y', 'watch_acceleration:3d:std_z', 'watch_acceleration:3d:ro_xy', 'watch_acceleration:3d:ro_xz', 'watch_acceleration:3d:ro_yz', 'watch_acceleration:spectrum:x_log_energy_band0', 'watch_acceleration:spectrum:x_log_energy_band1', 'watch_acceleration:spectrum:x_log_energy_band2', 'watch_acceleration:spectrum:x_log_energy_band3', 'watch_acceleration:spectrum:x_log_energy_band4', 'watch_acceleration:spectrum:y_log_energy_band0', 'watch_acceleration:spectrum:y_log_energy_band1', 'watch_acceleration:spectrum:y_log_energy_band2', 'watch_acceleration:spectrum:y_log_energy_band3', 'watch_acceleration:spectrum:y_log_energy_band4', 'watch_acceleration:spectrum:z_log_energy_band0', 'watch_acceleration:spectrum:z_log_energy_band1', 'watch_acceleration:spectrum:z_log_energy_band2', 'watch_acceleration:spectrum:z_log_energy_band3', 'watch_acceleration:spectrum:z_log_energy_band4', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range0', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range1', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range2', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range3', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range4', 'watch_heading:mean_cos', 'watch_heading:std_cos', 'watch_heading:mom3_cos', 'watch_heading:mom4_cos', 'watch_heading:mean_sin', 'watch_heading:std_sin', 'watch_heading:mom3_sin', 'watch_heading:mom4_sin', 'watch_heading:entropy_8bins', 'location:num_valid_updates', 'location:log_latitude_range', 'location:log_longitude_range', 'location:min_altitude', 'location:max_altitude', 'location:min_speed', 'location:max_speed', 'location:best_horizontal_accuracy', 'location:best_vertical_accuracy', 'location:diameter', 'location:log_diameter', 'location_quick_features:std_lat', 'location_quick_features:std_long', 'location_quick_features:lat_change', 'location_quick_features:long_change', 'location_quick_features:mean_abs_lat_deriv', 'location_quick_features:mean_abs_long_deriv', 'lf_measurements:light', 'lf_measurements:pressure', 'lf_measurements:proximity_cm', 'lf_measurements:proximity', 'lf_measurements:relative_humidity', 'lf_measurements:screen_brightness', 'lf_measurements:temperature_ambient', 'total_length'] WARNING:tensorflow:Layer lstm_11 will not use cuDNN kernels since it doesn't meet the criteria. It will use a generic GPU kernel as fallback when running on GPU. 4924/4924 [==============================] - 95s 19ms/step - loss: 1.0229 Checking different fills at 5 threshhold with median fill option 114 out of 279 Columns to remove due to excessive missing data: ['proc_gyro:3d:ro_xy', 'proc_gyro:3d:ro_xz', 'proc_gyro:3d:ro_yz', 'raw_magnet:magnitude_stats:mean', 'raw_magnet:magnitude_stats:std', 'raw_magnet:magnitude_stats:moment3', 'raw_magnet:magnitude_stats:moment4', 'raw_magnet:magnitude_stats:percentile25', 'raw_magnet:magnitude_stats:percentile50', 'raw_magnet:magnitude_stats:percentile75', 'raw_magnet:magnitude_stats:value_entropy', 'raw_magnet:magnitude_stats:time_entropy', 'raw_magnet:magnitude_spectrum:log_energy_band0', 'raw_magnet:magnitude_spectrum:log_energy_band1', 'raw_magnet:magnitude_spectrum:log_energy_band2', 'raw_magnet:magnitude_spectrum:log_energy_band3', 'raw_magnet:magnitude_spectrum:log_energy_band4', 'raw_magnet:magnitude_spectrum:spectral_entropy', 'raw_magnet:magnitude_autocorrelation:period', 'raw_magnet:magnitude_autocorrelation:normalized_ac', 'raw_magnet:3d:mean_x', 'raw_magnet:3d:mean_y', 'raw_magnet:3d:mean_z', 'raw_magnet:3d:std_x', 'raw_magnet:3d:std_y', 'raw_magnet:3d:std_z', 'raw_magnet:3d:ro_xy', 'raw_magnet:3d:ro_xz', 'raw_magnet:3d:ro_yz', 'raw_magnet:avr_cosine_similarity_lag_range0', 'raw_magnet:avr_cosine_similarity_lag_range1', 'raw_magnet:avr_cosine_similarity_lag_range2', 'raw_magnet:avr_cosine_similarity_lag_range3', 'raw_magnet:avr_cosine_similarity_lag_range4', 'watch_acceleration:magnitude_stats:mean', 'watch_acceleration:magnitude_stats:std', 'watch_acceleration:magnitude_stats:moment3', 'watch_acceleration:magnitude_stats:moment4', 'watch_acceleration:magnitude_stats:percentile25', 'watch_acceleration:magnitude_stats:percentile50', 'watch_acceleration:magnitude_stats:percentile75', 'watch_acceleration:magnitude_stats:value_entropy', 'watch_acceleration:magnitude_stats:time_entropy', 'watch_acceleration:magnitude_spectrum:log_energy_band0', 'watch_acceleration:magnitude_spectrum:log_energy_band1', 'watch_acceleration:magnitude_spectrum:log_energy_band2', 'watch_acceleration:magnitude_spectrum:log_energy_band3', 'watch_acceleration:magnitude_spectrum:log_energy_band4', 'watch_acceleration:magnitude_spectrum:spectral_entropy', 'watch_acceleration:magnitude_autocorrelation:period', 'watch_acceleration:magnitude_autocorrelation:normalized_ac', 'watch_acceleration:3d:mean_x', 'watch_acceleration:3d:mean_y', 'watch_acceleration:3d:mean_z', 'watch_acceleration:3d:std_x', 'watch_acceleration:3d:std_y', 'watch_acceleration:3d:std_z', 'watch_acceleration:3d:ro_xy', 'watch_acceleration:3d:ro_xz', 'watch_acceleration:3d:ro_yz', 'watch_acceleration:spectrum:x_log_energy_band0', 'watch_acceleration:spectrum:x_log_energy_band1', 'watch_acceleration:spectrum:x_log_energy_band2', 'watch_acceleration:spectrum:x_log_energy_band3', 'watch_acceleration:spectrum:x_log_energy_band4', 'watch_acceleration:spectrum:y_log_energy_band0', 'watch_acceleration:spectrum:y_log_energy_band1', 'watch_acceleration:spectrum:y_log_energy_band2', 'watch_acceleration:spectrum:y_log_energy_band3', 'watch_acceleration:spectrum:y_log_energy_band4', 'watch_acceleration:spectrum:z_log_energy_band0', 'watch_acceleration:spectrum:z_log_energy_band1', 'watch_acceleration:spectrum:z_log_energy_band2', 'watch_acceleration:spectrum:z_log_energy_band3', 'watch_acceleration:spectrum:z_log_energy_band4', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range0', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range1', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range2', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range3', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range4', 'watch_heading:mean_cos', 'watch_heading:std_cos', 'watch_heading:mom3_cos', 'watch_heading:mom4_cos', 'watch_heading:mean_sin', 'watch_heading:std_sin', 'watch_heading:mom3_sin', 'watch_heading:mom4_sin', 'watch_heading:entropy_8bins', 'location:num_valid_updates', 'location:log_latitude_range', 'location:log_longitude_range', 'location:min_altitude', 'location:max_altitude', 'location:min_speed', 'location:max_speed', 'location:best_horizontal_accuracy', 'location:best_vertical_accuracy', 'location:diameter', 'location:log_diameter', 'location_quick_features:std_lat', 'location_quick_features:std_long', 'location_quick_features:lat_change', 'location_quick_features:long_change', 'location_quick_features:mean_abs_lat_deriv', 'location_quick_features:mean_abs_long_deriv', 'lf_measurements:light', 'lf_measurements:pressure', 'lf_measurements:proximity_cm', 'lf_measurements:proximity', 'lf_measurements:relative_humidity', 'lf_measurements:screen_brightness', 'lf_measurements:temperature_ambient', 'total_length'] WARNING:tensorflow:Layer lstm_12 will not use cuDNN kernels since it doesn't meet the criteria. It will use a generic GPU kernel as fallback when running on GPU. 4924/4924 [==============================] - 95s 19ms/step - loss: 0.9555 Checking different fills at 5 threshhold with zero fill option 114 out of 279 Columns to remove due to excessive missing data: ['proc_gyro:3d:ro_xy', 'proc_gyro:3d:ro_xz', 'proc_gyro:3d:ro_yz', 'raw_magnet:magnitude_stats:mean', 'raw_magnet:magnitude_stats:std', 'raw_magnet:magnitude_stats:moment3', 'raw_magnet:magnitude_stats:moment4', 'raw_magnet:magnitude_stats:percentile25', 'raw_magnet:magnitude_stats:percentile50', 'raw_magnet:magnitude_stats:percentile75', 'raw_magnet:magnitude_stats:value_entropy', 'raw_magnet:magnitude_stats:time_entropy', 'raw_magnet:magnitude_spectrum:log_energy_band0', 'raw_magnet:magnitude_spectrum:log_energy_band1', 'raw_magnet:magnitude_spectrum:log_energy_band2', 'raw_magnet:magnitude_spectrum:log_energy_band3', 'raw_magnet:magnitude_spectrum:log_energy_band4', 'raw_magnet:magnitude_spectrum:spectral_entropy', 'raw_magnet:magnitude_autocorrelation:period', 'raw_magnet:magnitude_autocorrelation:normalized_ac', 'raw_magnet:3d:mean_x', 'raw_magnet:3d:mean_y', 'raw_magnet:3d:mean_z', 'raw_magnet:3d:std_x', 'raw_magnet:3d:std_y', 'raw_magnet:3d:std_z', 'raw_magnet:3d:ro_xy', 'raw_magnet:3d:ro_xz', 'raw_magnet:3d:ro_yz', 'raw_magnet:avr_cosine_similarity_lag_range0', 'raw_magnet:avr_cosine_similarity_lag_range1', 'raw_magnet:avr_cosine_similarity_lag_range2', 'raw_magnet:avr_cosine_similarity_lag_range3', 'raw_magnet:avr_cosine_similarity_lag_range4', 'watch_acceleration:magnitude_stats:mean', 'watch_acceleration:magnitude_stats:std', 'watch_acceleration:magnitude_stats:moment3', 'watch_acceleration:magnitude_stats:moment4', 'watch_acceleration:magnitude_stats:percentile25', 'watch_acceleration:magnitude_stats:percentile50', 'watch_acceleration:magnitude_stats:percentile75', 'watch_acceleration:magnitude_stats:value_entropy', 'watch_acceleration:magnitude_stats:time_entropy', 'watch_acceleration:magnitude_spectrum:log_energy_band0', 'watch_acceleration:magnitude_spectrum:log_energy_band1', 'watch_acceleration:magnitude_spectrum:log_energy_band2', 'watch_acceleration:magnitude_spectrum:log_energy_band3', 'watch_acceleration:magnitude_spectrum:log_energy_band4', 'watch_acceleration:magnitude_spectrum:spectral_entropy', 'watch_acceleration:magnitude_autocorrelation:period', 'watch_acceleration:magnitude_autocorrelation:normalized_ac', 'watch_acceleration:3d:mean_x', 'watch_acceleration:3d:mean_y', 'watch_acceleration:3d:mean_z', 'watch_acceleration:3d:std_x', 'watch_acceleration:3d:std_y', 'watch_acceleration:3d:std_z', 'watch_acceleration:3d:ro_xy', 'watch_acceleration:3d:ro_xz', 'watch_acceleration:3d:ro_yz', 'watch_acceleration:spectrum:x_log_energy_band0', 'watch_acceleration:spectrum:x_log_energy_band1', 'watch_acceleration:spectrum:x_log_energy_band2', 'watch_acceleration:spectrum:x_log_energy_band3', 'watch_acceleration:spectrum:x_log_energy_band4', 'watch_acceleration:spectrum:y_log_energy_band0', 'watch_acceleration:spectrum:y_log_energy_band1', 'watch_acceleration:spectrum:y_log_energy_band2', 'watch_acceleration:spectrum:y_log_energy_band3', 'watch_acceleration:spectrum:y_log_energy_band4', 'watch_acceleration:spectrum:z_log_energy_band0', 'watch_acceleration:spectrum:z_log_energy_band1', 'watch_acceleration:spectrum:z_log_energy_band2', 'watch_acceleration:spectrum:z_log_energy_band3', 'watch_acceleration:spectrum:z_log_energy_band4', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range0', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range1', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range2', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range3', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range4', 'watch_heading:mean_cos', 'watch_heading:std_cos', 'watch_heading:mom3_cos', 'watch_heading:mom4_cos', 'watch_heading:mean_sin', 'watch_heading:std_sin', 'watch_heading:mom3_sin', 'watch_heading:mom4_sin', 'watch_heading:entropy_8bins', 'location:num_valid_updates', 'location:log_latitude_range', 'location:log_longitude_range', 'location:min_altitude', 'location:max_altitude', 'location:min_speed', 'location:max_speed', 'location:best_horizontal_accuracy', 'location:best_vertical_accuracy', 'location:diameter', 'location:log_diameter', 'location_quick_features:std_lat', 'location_quick_features:std_long', 'location_quick_features:lat_change', 'location_quick_features:long_change', 'location_quick_features:mean_abs_lat_deriv', 'location_quick_features:mean_abs_long_deriv', 'lf_measurements:light', 'lf_measurements:pressure', 'lf_measurements:proximity_cm', 'lf_measurements:proximity', 'lf_measurements:relative_humidity', 'lf_measurements:screen_brightness', 'lf_measurements:temperature_ambient', 'total_length'] WARNING:tensorflow:Layer lstm_13 will not use cuDNN kernels since it doesn't meet the criteria. It will use a generic GPU kernel as fallback when running on GPU. 4924/4924 [==============================] - 100s 20ms/step - loss: 0.9557
Testing epoch = 2¶
model_name = "threshold0_median_epoch2"
models_with_threshhold[model_name], features_to_include[model_name] = testing_threshold(0, -1, fill_type = 'median', epochs= 2)
192 out of 279 Columns to remove due to excessive missing data: ['raw_acc:magnitude_stats:mean', 'raw_acc:magnitude_stats:std', 'raw_acc:magnitude_stats:moment3', 'raw_acc:magnitude_stats:moment4', 'raw_acc:magnitude_stats:percentile25', 'raw_acc:magnitude_stats:percentile50', 'raw_acc:magnitude_stats:percentile75', 'raw_acc:magnitude_stats:value_entropy', 'raw_acc:magnitude_stats:time_entropy', 'raw_acc:magnitude_spectrum:log_energy_band0', 'raw_acc:magnitude_spectrum:log_energy_band1', 'raw_acc:magnitude_spectrum:log_energy_band2', 'raw_acc:magnitude_spectrum:log_energy_band3', 'raw_acc:magnitude_spectrum:log_energy_band4', 'raw_acc:magnitude_spectrum:spectral_entropy', 'raw_acc:magnitude_autocorrelation:period', 'raw_acc:magnitude_autocorrelation:normalized_ac', 'raw_acc:3d:mean_x', 'raw_acc:3d:mean_y', 'raw_acc:3d:mean_z', 'raw_acc:3d:std_x', 'raw_acc:3d:std_y', 'raw_acc:3d:std_z', 'raw_acc:3d:ro_xy', 'raw_acc:3d:ro_xz', 'raw_acc:3d:ro_yz', 'proc_gyro:magnitude_stats:mean', 'proc_gyro:magnitude_stats:std', 'proc_gyro:magnitude_stats:moment3', 'proc_gyro:magnitude_stats:moment4', 'proc_gyro:magnitude_stats:percentile25', 'proc_gyro:magnitude_stats:percentile50', 'proc_gyro:magnitude_stats:percentile75', 'proc_gyro:magnitude_stats:value_entropy', 'proc_gyro:magnitude_stats:time_entropy', 'proc_gyro:magnitude_spectrum:log_energy_band0', 'proc_gyro:magnitude_spectrum:log_energy_band1', 'proc_gyro:magnitude_spectrum:log_energy_band2', 'proc_gyro:magnitude_spectrum:log_energy_band3', 'proc_gyro:magnitude_spectrum:log_energy_band4', 'proc_gyro:magnitude_spectrum:spectral_entropy', 'proc_gyro:magnitude_autocorrelation:period', 'proc_gyro:magnitude_autocorrelation:normalized_ac', 'proc_gyro:3d:mean_x', 'proc_gyro:3d:mean_y', 'proc_gyro:3d:mean_z', 'proc_gyro:3d:std_x', 'proc_gyro:3d:std_y', 'proc_gyro:3d:std_z', 'proc_gyro:3d:ro_xy', 'proc_gyro:3d:ro_xz', 'proc_gyro:3d:ro_yz', 'raw_magnet:magnitude_stats:mean', 'raw_magnet:magnitude_stats:std', 'raw_magnet:magnitude_stats:moment3', 'raw_magnet:magnitude_stats:moment4', 'raw_magnet:magnitude_stats:percentile25', 'raw_magnet:magnitude_stats:percentile50', 'raw_magnet:magnitude_stats:percentile75', 'raw_magnet:magnitude_stats:value_entropy', 'raw_magnet:magnitude_stats:time_entropy', 'raw_magnet:magnitude_spectrum:log_energy_band0', 'raw_magnet:magnitude_spectrum:log_energy_band1', 'raw_magnet:magnitude_spectrum:log_energy_band2', 'raw_magnet:magnitude_spectrum:log_energy_band3', 'raw_magnet:magnitude_spectrum:log_energy_band4', 'raw_magnet:magnitude_spectrum:spectral_entropy', 'raw_magnet:magnitude_autocorrelation:period', 'raw_magnet:magnitude_autocorrelation:normalized_ac', 'raw_magnet:3d:mean_x', 'raw_magnet:3d:mean_y', 'raw_magnet:3d:mean_z', 'raw_magnet:3d:std_x', 'raw_magnet:3d:std_y', 'raw_magnet:3d:std_z', 'raw_magnet:3d:ro_xy', 'raw_magnet:3d:ro_xz', 'raw_magnet:3d:ro_yz', 'raw_magnet:avr_cosine_similarity_lag_range0', 'raw_magnet:avr_cosine_similarity_lag_range1', 'raw_magnet:avr_cosine_similarity_lag_range2', 'raw_magnet:avr_cosine_similarity_lag_range3', 'raw_magnet:avr_cosine_similarity_lag_range4', 'watch_acceleration:magnitude_stats:mean', 'watch_acceleration:magnitude_stats:std', 'watch_acceleration:magnitude_stats:moment3', 'watch_acceleration:magnitude_stats:moment4', 'watch_acceleration:magnitude_stats:percentile25', 'watch_acceleration:magnitude_stats:percentile50', 'watch_acceleration:magnitude_stats:percentile75', 'watch_acceleration:magnitude_stats:value_entropy', 'watch_acceleration:magnitude_stats:time_entropy', 'watch_acceleration:magnitude_spectrum:log_energy_band0', 'watch_acceleration:magnitude_spectrum:log_energy_band1', 'watch_acceleration:magnitude_spectrum:log_energy_band2', 'watch_acceleration:magnitude_spectrum:log_energy_band3', 'watch_acceleration:magnitude_spectrum:log_energy_band4', 'watch_acceleration:magnitude_spectrum:spectral_entropy', 'watch_acceleration:magnitude_autocorrelation:period', 'watch_acceleration:magnitude_autocorrelation:normalized_ac', 'watch_acceleration:3d:mean_x', 'watch_acceleration:3d:mean_y', 'watch_acceleration:3d:mean_z', 'watch_acceleration:3d:std_x', 'watch_acceleration:3d:std_y', 'watch_acceleration:3d:std_z', 'watch_acceleration:3d:ro_xy', 'watch_acceleration:3d:ro_xz', 'watch_acceleration:3d:ro_yz', 'watch_acceleration:spectrum:x_log_energy_band0', 'watch_acceleration:spectrum:x_log_energy_band1', 'watch_acceleration:spectrum:x_log_energy_band2', 'watch_acceleration:spectrum:x_log_energy_band3', 'watch_acceleration:spectrum:x_log_energy_band4', 'watch_acceleration:spectrum:y_log_energy_band0', 'watch_acceleration:spectrum:y_log_energy_band1', 'watch_acceleration:spectrum:y_log_energy_band2', 'watch_acceleration:spectrum:y_log_energy_band3', 'watch_acceleration:spectrum:y_log_energy_band4', 'watch_acceleration:spectrum:z_log_energy_band0', 'watch_acceleration:spectrum:z_log_energy_band1', 'watch_acceleration:spectrum:z_log_energy_band2', 'watch_acceleration:spectrum:z_log_energy_band3', 'watch_acceleration:spectrum:z_log_energy_band4', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range0', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range1', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range2', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range3', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range4', 'watch_heading:mean_cos', 'watch_heading:std_cos', 'watch_heading:mom3_cos', 'watch_heading:mom4_cos', 'watch_heading:mean_sin', 'watch_heading:std_sin', 'watch_heading:mom3_sin', 'watch_heading:mom4_sin', 'watch_heading:entropy_8bins', 'location:num_valid_updates', 'location:log_latitude_range', 'location:log_longitude_range', 'location:min_altitude', 'location:max_altitude', 'location:min_speed', 'location:max_speed', 'location:best_horizontal_accuracy', 'location:best_vertical_accuracy', 'location:diameter', 'location:log_diameter', 'location_quick_features:std_lat', 'location_quick_features:std_long', 'location_quick_features:lat_change', 'location_quick_features:long_change', 'location_quick_features:mean_abs_lat_deriv', 'location_quick_features:mean_abs_long_deriv', 'audio_naive:mfcc0:mean', 'audio_naive:mfcc1:mean', 'audio_naive:mfcc2:mean', 'audio_naive:mfcc3:mean', 'audio_naive:mfcc4:mean', 'audio_naive:mfcc5:mean', 'audio_naive:mfcc6:mean', 'audio_naive:mfcc7:mean', 'audio_naive:mfcc8:mean', 'audio_naive:mfcc9:mean', 'audio_naive:mfcc10:mean', 'audio_naive:mfcc11:mean', 'audio_naive:mfcc12:mean', 'audio_naive:mfcc0:std', 'audio_naive:mfcc1:std', 'audio_naive:mfcc2:std', 'audio_naive:mfcc3:std', 'audio_naive:mfcc4:std', 'audio_naive:mfcc5:std', 'audio_naive:mfcc6:std', 'audio_naive:mfcc7:std', 'audio_naive:mfcc8:std', 'audio_naive:mfcc9:std', 'audio_naive:mfcc10:std', 'audio_naive:mfcc11:std', 'audio_naive:mfcc12:std', 'audio_properties:max_abs_value', 'audio_properties:normalization_multiplier', 'lf_measurements:light', 'lf_measurements:pressure', 'lf_measurements:proximity_cm', 'lf_measurements:proximity', 'lf_measurements:relative_humidity', 'lf_measurements:battery_level', 'lf_measurements:screen_brightness', 'lf_measurements:temperature_ambient', 'total_length'] WARNING:tensorflow:Layer lstm_28 will not use cuDNN kernels since it doesn't meet the criteria. It will use a generic GPU kernel as fallback when running on GPU. Epoch 1/2 4924/4924 [==============================] - 101s 20ms/step - loss: 0.7920 Epoch 2/2 4924/4924 [==============================] - 98s 20ms/step - loss: 1.0072
We will go with the lowest threshold and with median fill option.¶
model_name = "threshold0_median_epoch1"
models_with_threshhold[model_name], features_to_include[model_name] = testing_threshold(0, -1, fill_type = 'median')
192 out of 279 Columns to remove due to excessive missing data: ['raw_acc:magnitude_stats:mean', 'raw_acc:magnitude_stats:std', 'raw_acc:magnitude_stats:moment3', 'raw_acc:magnitude_stats:moment4', 'raw_acc:magnitude_stats:percentile25', 'raw_acc:magnitude_stats:percentile50', 'raw_acc:magnitude_stats:percentile75', 'raw_acc:magnitude_stats:value_entropy', 'raw_acc:magnitude_stats:time_entropy', 'raw_acc:magnitude_spectrum:log_energy_band0', 'raw_acc:magnitude_spectrum:log_energy_band1', 'raw_acc:magnitude_spectrum:log_energy_band2', 'raw_acc:magnitude_spectrum:log_energy_band3', 'raw_acc:magnitude_spectrum:log_energy_band4', 'raw_acc:magnitude_spectrum:spectral_entropy', 'raw_acc:magnitude_autocorrelation:period', 'raw_acc:magnitude_autocorrelation:normalized_ac', 'raw_acc:3d:mean_x', 'raw_acc:3d:mean_y', 'raw_acc:3d:mean_z', 'raw_acc:3d:std_x', 'raw_acc:3d:std_y', 'raw_acc:3d:std_z', 'raw_acc:3d:ro_xy', 'raw_acc:3d:ro_xz', 'raw_acc:3d:ro_yz', 'proc_gyro:magnitude_stats:mean', 'proc_gyro:magnitude_stats:std', 'proc_gyro:magnitude_stats:moment3', 'proc_gyro:magnitude_stats:moment4', 'proc_gyro:magnitude_stats:percentile25', 'proc_gyro:magnitude_stats:percentile50', 'proc_gyro:magnitude_stats:percentile75', 'proc_gyro:magnitude_stats:value_entropy', 'proc_gyro:magnitude_stats:time_entropy', 'proc_gyro:magnitude_spectrum:log_energy_band0', 'proc_gyro:magnitude_spectrum:log_energy_band1', 'proc_gyro:magnitude_spectrum:log_energy_band2', 'proc_gyro:magnitude_spectrum:log_energy_band3', 'proc_gyro:magnitude_spectrum:log_energy_band4', 'proc_gyro:magnitude_spectrum:spectral_entropy', 'proc_gyro:magnitude_autocorrelation:period', 'proc_gyro:magnitude_autocorrelation:normalized_ac', 'proc_gyro:3d:mean_x', 'proc_gyro:3d:mean_y', 'proc_gyro:3d:mean_z', 'proc_gyro:3d:std_x', 'proc_gyro:3d:std_y', 'proc_gyro:3d:std_z', 'proc_gyro:3d:ro_xy', 'proc_gyro:3d:ro_xz', 'proc_gyro:3d:ro_yz', 'raw_magnet:magnitude_stats:mean', 'raw_magnet:magnitude_stats:std', 'raw_magnet:magnitude_stats:moment3', 'raw_magnet:magnitude_stats:moment4', 'raw_magnet:magnitude_stats:percentile25', 'raw_magnet:magnitude_stats:percentile50', 'raw_magnet:magnitude_stats:percentile75', 'raw_magnet:magnitude_stats:value_entropy', 'raw_magnet:magnitude_stats:time_entropy', 'raw_magnet:magnitude_spectrum:log_energy_band0', 'raw_magnet:magnitude_spectrum:log_energy_band1', 'raw_magnet:magnitude_spectrum:log_energy_band2', 'raw_magnet:magnitude_spectrum:log_energy_band3', 'raw_magnet:magnitude_spectrum:log_energy_band4', 'raw_magnet:magnitude_spectrum:spectral_entropy', 'raw_magnet:magnitude_autocorrelation:period', 'raw_magnet:magnitude_autocorrelation:normalized_ac', 'raw_magnet:3d:mean_x', 'raw_magnet:3d:mean_y', 'raw_magnet:3d:mean_z', 'raw_magnet:3d:std_x', 'raw_magnet:3d:std_y', 'raw_magnet:3d:std_z', 'raw_magnet:3d:ro_xy', 'raw_magnet:3d:ro_xz', 'raw_magnet:3d:ro_yz', 'raw_magnet:avr_cosine_similarity_lag_range0', 'raw_magnet:avr_cosine_similarity_lag_range1', 'raw_magnet:avr_cosine_similarity_lag_range2', 'raw_magnet:avr_cosine_similarity_lag_range3', 'raw_magnet:avr_cosine_similarity_lag_range4', 'watch_acceleration:magnitude_stats:mean', 'watch_acceleration:magnitude_stats:std', 'watch_acceleration:magnitude_stats:moment3', 'watch_acceleration:magnitude_stats:moment4', 'watch_acceleration:magnitude_stats:percentile25', 'watch_acceleration:magnitude_stats:percentile50', 'watch_acceleration:magnitude_stats:percentile75', 'watch_acceleration:magnitude_stats:value_entropy', 'watch_acceleration:magnitude_stats:time_entropy', 'watch_acceleration:magnitude_spectrum:log_energy_band0', 'watch_acceleration:magnitude_spectrum:log_energy_band1', 'watch_acceleration:magnitude_spectrum:log_energy_band2', 'watch_acceleration:magnitude_spectrum:log_energy_band3', 'watch_acceleration:magnitude_spectrum:log_energy_band4', 'watch_acceleration:magnitude_spectrum:spectral_entropy', 'watch_acceleration:magnitude_autocorrelation:period', 'watch_acceleration:magnitude_autocorrelation:normalized_ac', 'watch_acceleration:3d:mean_x', 'watch_acceleration:3d:mean_y', 'watch_acceleration:3d:mean_z', 'watch_acceleration:3d:std_x', 'watch_acceleration:3d:std_y', 'watch_acceleration:3d:std_z', 'watch_acceleration:3d:ro_xy', 'watch_acceleration:3d:ro_xz', 'watch_acceleration:3d:ro_yz', 'watch_acceleration:spectrum:x_log_energy_band0', 'watch_acceleration:spectrum:x_log_energy_band1', 'watch_acceleration:spectrum:x_log_energy_band2', 'watch_acceleration:spectrum:x_log_energy_band3', 'watch_acceleration:spectrum:x_log_energy_band4', 'watch_acceleration:spectrum:y_log_energy_band0', 'watch_acceleration:spectrum:y_log_energy_band1', 'watch_acceleration:spectrum:y_log_energy_band2', 'watch_acceleration:spectrum:y_log_energy_band3', 'watch_acceleration:spectrum:y_log_energy_band4', 'watch_acceleration:spectrum:z_log_energy_band0', 'watch_acceleration:spectrum:z_log_energy_band1', 'watch_acceleration:spectrum:z_log_energy_band2', 'watch_acceleration:spectrum:z_log_energy_band3', 'watch_acceleration:spectrum:z_log_energy_band4', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range0', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range1', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range2', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range3', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range4', 'watch_heading:mean_cos', 'watch_heading:std_cos', 'watch_heading:mom3_cos', 'watch_heading:mom4_cos', 'watch_heading:mean_sin', 'watch_heading:std_sin', 'watch_heading:mom3_sin', 'watch_heading:mom4_sin', 'watch_heading:entropy_8bins', 'location:num_valid_updates', 'location:log_latitude_range', 'location:log_longitude_range', 'location:min_altitude', 'location:max_altitude', 'location:min_speed', 'location:max_speed', 'location:best_horizontal_accuracy', 'location:best_vertical_accuracy', 'location:diameter', 'location:log_diameter', 'location_quick_features:std_lat', 'location_quick_features:std_long', 'location_quick_features:lat_change', 'location_quick_features:long_change', 'location_quick_features:mean_abs_lat_deriv', 'location_quick_features:mean_abs_long_deriv', 'audio_naive:mfcc0:mean', 'audio_naive:mfcc1:mean', 'audio_naive:mfcc2:mean', 'audio_naive:mfcc3:mean', 'audio_naive:mfcc4:mean', 'audio_naive:mfcc5:mean', 'audio_naive:mfcc6:mean', 'audio_naive:mfcc7:mean', 'audio_naive:mfcc8:mean', 'audio_naive:mfcc9:mean', 'audio_naive:mfcc10:mean', 'audio_naive:mfcc11:mean', 'audio_naive:mfcc12:mean', 'audio_naive:mfcc0:std', 'audio_naive:mfcc1:std', 'audio_naive:mfcc2:std', 'audio_naive:mfcc3:std', 'audio_naive:mfcc4:std', 'audio_naive:mfcc5:std', 'audio_naive:mfcc6:std', 'audio_naive:mfcc7:std', 'audio_naive:mfcc8:std', 'audio_naive:mfcc9:std', 'audio_naive:mfcc10:std', 'audio_naive:mfcc11:std', 'audio_naive:mfcc12:std', 'audio_properties:max_abs_value', 'audio_properties:normalization_multiplier', 'lf_measurements:light', 'lf_measurements:pressure', 'lf_measurements:proximity_cm', 'lf_measurements:proximity', 'lf_measurements:relative_humidity', 'lf_measurements:battery_level', 'lf_measurements:screen_brightness', 'lf_measurements:temperature_ambient', 'total_length'] WARNING:tensorflow:Layer lstm_24 will not use cuDNN kernels since it doesn't meet the criteria. It will use a generic GPU kernel as fallback when running on GPU. 4924/4924 [==============================] - 103s 21ms/step - loss: 0.8103
best_model_one_so_far = 'threshold0_median_epoch1'
print(features_to_include[best_model_one_so_far])
['timestamp', 'discrete:app_state:is_active', 'discrete:app_state:is_inactive', 'discrete:app_state:is_background', 'discrete:app_state:missing', 'discrete:battery_plugged:is_ac', 'discrete:battery_plugged:is_usb', 'discrete:battery_plugged:is_wireless', 'discrete:battery_plugged:missing', 'discrete:battery_state:is_unknown', 'discrete:battery_state:is_unplugged', 'discrete:battery_state:is_not_charging', 'discrete:battery_state:is_discharging', 'discrete:battery_state:is_charging', 'discrete:battery_state:is_full', 'discrete:battery_state:missing', 'discrete:on_the_phone:is_False', 'discrete:on_the_phone:is_True', 'discrete:on_the_phone:missing', 'discrete:ringer_mode:is_normal', 'discrete:ringer_mode:is_silent_no_vibrate', 'discrete:ringer_mode:is_silent_with_vibrate', 'discrete:ringer_mode:missing', 'discrete:wifi_status:is_not_reachable', 'discrete:wifi_status:is_reachable_via_wifi', 'discrete:wifi_status:is_reachable_via_wwan', 'discrete:wifi_status:missing', 'discrete:time_of_day:between0and6', 'discrete:time_of_day:between3and9', 'discrete:time_of_day:between6and12', 'discrete:time_of_day:between9and15', 'discrete:time_of_day:between12and18', 'discrete:time_of_day:between15and21', 'discrete:time_of_day:between18and24', 'discrete:time_of_day:between21and3']
import numpy as np
from sklearn.preprocessing import StandardScaler
def predict_from_df(df, model, features_to_include, look_back=3):
"""
Process the given DataFrame and predict the next value using the LSTM model.
Parameters:
- df: DataFrame to process and predict from.
- model: Trained LSTM model to use for predictions.
- features_to_include: List of feature names to include in the prediction.
- look_back: Number of previous time steps to use as input for predictions.
Returns:
- predictions: Predicted values for the next time step.
"""
# Ensure the DataFrame contains the necessary features
if not all(feature in df.columns for feature in features_to_include):
raise ValueError("DataFrame missing required features")
# Fill missing values with median
df_filled = df[features_to_include].fillna(df[features_to_include].median())
# Scale features
scaler = StandardScaler().fit(df_filled)
df_scaled = scaler.transform(df_filled)
# Create sequences
sequences = np.array([df_scaled[i - look_back:i] for i in range(look_back, len(df_scaled) + 1)])
# Predict using the LSTM model
predictions = model.predict(sequences)
return predictions
# Example of how to use the function
# Ensure 'model', 'features_to_include', and 'look_back' are defined as per your model's training setup
predictions = predict_from_df(df_user.iloc[1:4], models_with_threshhold[best_model_one_so_far], features_to_include[best_model_one_so_far], look_back=3)
print(predictions)
1/1 [==============================] - 0s 142ms/step [[-0.04484926]]
# Calculate the total NaN count for each feature across all users
total_nan_counts = nan_counts_per_user.sum()
# Assuming `X_with_users` is your original DataFrame and has the same number of entries for each user,
# Calculate the total number of entries for a single feature across all users
total_entries_per_feature = len(X_with_users) # Or, more specifically, len(users) * average_entries_per_user if varies
# Calculate the percentage of missing data for each feature
percentage_missing = (total_nan_counts / total_entries_per_feature) * 100
# Decide on a threshold for removing columns, e.g., 1%
threshold = 0
# Identify columns that exceed this threshold
columns_to_remove = percentage_missing[percentage_missing > threshold].index.tolist()
print(len(columns_to_remove),"out of ", len(X_with_users.columns))
# Print out the columns to remove
print("Columns to remove due to excessive missing data:", columns_to_remove)
features = [feature for feature in features if feature not in columns_to_remove]
191 out of 279 Columns to remove due to excessive missing data: ['raw_acc:magnitude_stats:mean', 'raw_acc:magnitude_stats:std', 'raw_acc:magnitude_stats:moment3', 'raw_acc:magnitude_stats:moment4', 'raw_acc:magnitude_stats:percentile25', 'raw_acc:magnitude_stats:percentile50', 'raw_acc:magnitude_stats:percentile75', 'raw_acc:magnitude_stats:value_entropy', 'raw_acc:magnitude_stats:time_entropy', 'raw_acc:magnitude_spectrum:log_energy_band0', 'raw_acc:magnitude_spectrum:log_energy_band1', 'raw_acc:magnitude_spectrum:log_energy_band2', 'raw_acc:magnitude_spectrum:log_energy_band3', 'raw_acc:magnitude_spectrum:log_energy_band4', 'raw_acc:magnitude_spectrum:spectral_entropy', 'raw_acc:magnitude_autocorrelation:period', 'raw_acc:magnitude_autocorrelation:normalized_ac', 'raw_acc:3d:mean_x', 'raw_acc:3d:mean_y', 'raw_acc:3d:mean_z', 'raw_acc:3d:std_x', 'raw_acc:3d:std_y', 'raw_acc:3d:std_z', 'raw_acc:3d:ro_xy', 'raw_acc:3d:ro_xz', 'raw_acc:3d:ro_yz', 'proc_gyro:magnitude_stats:mean', 'proc_gyro:magnitude_stats:std', 'proc_gyro:magnitude_stats:moment3', 'proc_gyro:magnitude_stats:moment4', 'proc_gyro:magnitude_stats:percentile25', 'proc_gyro:magnitude_stats:percentile50', 'proc_gyro:magnitude_stats:percentile75', 'proc_gyro:magnitude_stats:value_entropy', 'proc_gyro:magnitude_stats:time_entropy', 'proc_gyro:magnitude_spectrum:log_energy_band0', 'proc_gyro:magnitude_spectrum:log_energy_band1', 'proc_gyro:magnitude_spectrum:log_energy_band2', 'proc_gyro:magnitude_spectrum:log_energy_band3', 'proc_gyro:magnitude_spectrum:log_energy_band4', 'proc_gyro:magnitude_spectrum:spectral_entropy', 'proc_gyro:magnitude_autocorrelation:period', 'proc_gyro:magnitude_autocorrelation:normalized_ac', 'proc_gyro:3d:mean_x', 'proc_gyro:3d:mean_y', 'proc_gyro:3d:mean_z', 'proc_gyro:3d:std_x', 'proc_gyro:3d:std_y', 'proc_gyro:3d:std_z', 'proc_gyro:3d:ro_xy', 'proc_gyro:3d:ro_xz', 'proc_gyro:3d:ro_yz', 'raw_magnet:magnitude_stats:mean', 'raw_magnet:magnitude_stats:std', 'raw_magnet:magnitude_stats:moment3', 'raw_magnet:magnitude_stats:moment4', 'raw_magnet:magnitude_stats:percentile25', 'raw_magnet:magnitude_stats:percentile50', 'raw_magnet:magnitude_stats:percentile75', 'raw_magnet:magnitude_stats:value_entropy', 'raw_magnet:magnitude_stats:time_entropy', 'raw_magnet:magnitude_spectrum:log_energy_band0', 'raw_magnet:magnitude_spectrum:log_energy_band1', 'raw_magnet:magnitude_spectrum:log_energy_band2', 'raw_magnet:magnitude_spectrum:log_energy_band3', 'raw_magnet:magnitude_spectrum:log_energy_band4', 'raw_magnet:magnitude_spectrum:spectral_entropy', 'raw_magnet:magnitude_autocorrelation:period', 'raw_magnet:magnitude_autocorrelation:normalized_ac', 'raw_magnet:3d:mean_x', 'raw_magnet:3d:mean_y', 'raw_magnet:3d:mean_z', 'raw_magnet:3d:std_x', 'raw_magnet:3d:std_y', 'raw_magnet:3d:std_z', 'raw_magnet:3d:ro_xy', 'raw_magnet:3d:ro_xz', 'raw_magnet:3d:ro_yz', 'raw_magnet:avr_cosine_similarity_lag_range0', 'raw_magnet:avr_cosine_similarity_lag_range1', 'raw_magnet:avr_cosine_similarity_lag_range2', 'raw_magnet:avr_cosine_similarity_lag_range3', 'raw_magnet:avr_cosine_similarity_lag_range4', 'watch_acceleration:magnitude_stats:mean', 'watch_acceleration:magnitude_stats:std', 'watch_acceleration:magnitude_stats:moment3', 'watch_acceleration:magnitude_stats:moment4', 'watch_acceleration:magnitude_stats:percentile25', 'watch_acceleration:magnitude_stats:percentile50', 'watch_acceleration:magnitude_stats:percentile75', 'watch_acceleration:magnitude_stats:value_entropy', 'watch_acceleration:magnitude_stats:time_entropy', 'watch_acceleration:magnitude_spectrum:log_energy_band0', 'watch_acceleration:magnitude_spectrum:log_energy_band1', 'watch_acceleration:magnitude_spectrum:log_energy_band2', 'watch_acceleration:magnitude_spectrum:log_energy_band3', 'watch_acceleration:magnitude_spectrum:log_energy_band4', 'watch_acceleration:magnitude_spectrum:spectral_entropy', 'watch_acceleration:magnitude_autocorrelation:period', 'watch_acceleration:magnitude_autocorrelation:normalized_ac', 'watch_acceleration:3d:mean_x', 'watch_acceleration:3d:mean_y', 'watch_acceleration:3d:mean_z', 'watch_acceleration:3d:std_x', 'watch_acceleration:3d:std_y', 'watch_acceleration:3d:std_z', 'watch_acceleration:3d:ro_xy', 'watch_acceleration:3d:ro_xz', 'watch_acceleration:3d:ro_yz', 'watch_acceleration:spectrum:x_log_energy_band0', 'watch_acceleration:spectrum:x_log_energy_band1', 'watch_acceleration:spectrum:x_log_energy_band2', 'watch_acceleration:spectrum:x_log_energy_band3', 'watch_acceleration:spectrum:x_log_energy_band4', 'watch_acceleration:spectrum:y_log_energy_band0', 'watch_acceleration:spectrum:y_log_energy_band1', 'watch_acceleration:spectrum:y_log_energy_band2', 'watch_acceleration:spectrum:y_log_energy_band3', 'watch_acceleration:spectrum:y_log_energy_band4', 'watch_acceleration:spectrum:z_log_energy_band0', 'watch_acceleration:spectrum:z_log_energy_band1', 'watch_acceleration:spectrum:z_log_energy_band2', 'watch_acceleration:spectrum:z_log_energy_band3', 'watch_acceleration:spectrum:z_log_energy_band4', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range0', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range1', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range2', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range3', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range4', 'watch_heading:mean_cos', 'watch_heading:std_cos', 'watch_heading:mom3_cos', 'watch_heading:mom4_cos', 'watch_heading:mean_sin', 'watch_heading:std_sin', 'watch_heading:mom3_sin', 'watch_heading:mom4_sin', 'watch_heading:entropy_8bins', 'location:num_valid_updates', 'location:log_latitude_range', 'location:log_longitude_range', 'location:min_altitude', 'location:max_altitude', 'location:min_speed', 'location:max_speed', 'location:best_horizontal_accuracy', 'location:best_vertical_accuracy', 'location:diameter', 'location:log_diameter', 'location_quick_features:std_lat', 'location_quick_features:std_long', 'location_quick_features:lat_change', 'location_quick_features:long_change', 'location_quick_features:mean_abs_lat_deriv', 'location_quick_features:mean_abs_long_deriv', 'audio_naive:mfcc0:mean', 'audio_naive:mfcc1:mean', 'audio_naive:mfcc2:mean', 'audio_naive:mfcc3:mean', 'audio_naive:mfcc4:mean', 'audio_naive:mfcc5:mean', 'audio_naive:mfcc6:mean', 'audio_naive:mfcc7:mean', 'audio_naive:mfcc8:mean', 'audio_naive:mfcc9:mean', 'audio_naive:mfcc10:mean', 'audio_naive:mfcc11:mean', 'audio_naive:mfcc12:mean', 'audio_naive:mfcc0:std', 'audio_naive:mfcc1:std', 'audio_naive:mfcc2:std', 'audio_naive:mfcc3:std', 'audio_naive:mfcc4:std', 'audio_naive:mfcc5:std', 'audio_naive:mfcc6:std', 'audio_naive:mfcc7:std', 'audio_naive:mfcc8:std', 'audio_naive:mfcc9:std', 'audio_naive:mfcc10:std', 'audio_naive:mfcc11:std', 'audio_naive:mfcc12:std', 'audio_properties:max_abs_value', 'audio_properties:normalization_multiplier', 'lf_measurements:light', 'lf_measurements:pressure', 'lf_measurements:proximity_cm', 'lf_measurements:proximity', 'lf_measurements:relative_humidity', 'lf_measurements:battery_level', 'lf_measurements:screen_brightness', 'lf_measurements:temperature_ambient']
nan_count = df_user[features].isna().sum()
nan_count_sorted = nan_count.sort_values(ascending=False)
print(len(nan_count_sorted))
nan_count_sorted
35
timestamp 0 discrete:wifi_status:missing 0 discrete:ringer_mode:is_silent_no_vibrate 0 discrete:ringer_mode:is_silent_with_vibrate 0 discrete:ringer_mode:missing 0 discrete:wifi_status:is_not_reachable 0 discrete:wifi_status:is_reachable_via_wifi 0 discrete:wifi_status:is_reachable_via_wwan 0 discrete:time_of_day:between0and6 0 discrete:on_the_phone:missing 0 discrete:time_of_day:between3and9 0 discrete:time_of_day:between6and12 0 discrete:time_of_day:between9and15 0 discrete:time_of_day:between12and18 0 discrete:time_of_day:between15and21 0 discrete:time_of_day:between18and24 0 discrete:ringer_mode:is_normal 0 discrete:on_the_phone:is_True 0 discrete:app_state:is_active 0 discrete:battery_plugged:missing 0 discrete:app_state:is_inactive 0 discrete:app_state:is_background 0 discrete:app_state:missing 0 discrete:battery_plugged:is_ac 0 discrete:battery_plugged:is_usb 0 discrete:battery_plugged:is_wireless 0 discrete:battery_state:is_unknown 0 discrete:on_the_phone:is_False 0 discrete:battery_state:is_unplugged 0 discrete:battery_state:is_not_charging 0 discrete:battery_state:is_discharging 0 discrete:battery_state:is_charging 0 discrete:battery_state:is_full 0 discrete:battery_state:missing 0 discrete:time_of_day:between21and3 0 dtype: int64
# First User
# Add more features as necessary
df_user = df_user[features].fillna(method='ffill')
scaler = StandardScaler()
df_user[features] = scaler.fit_transform(df_user)
# Define LSTM model architecture
def create_lstm_model(input_shape):
model = Sequential([
LSTM(50, activation='relu', input_shape=input_shape),
Dense(1)
])
model.compile(optimizer=Adam(learning_rate=0.001, clipvalue=0.5), loss='mse') # Apply gradient clipping
return model
# Assuming timestamps are sorted; if not, sort user_data by timestamp here
df_user.sort_values('timestamp', inplace=True)
# Convert user_data to sequences for LSTM
look_back = 5
generator = TimeseriesGenerator(df_user[features].values, df_user[features].values,
length=look_back, batch_size=1)
# Create and train LSTM model on the selected user's data
model = create_lstm_model((look_back, len(features)))
model.fit(generator, epochs=2, verbose=1) # Adjust epochs and verbosity as needed
WARNING:tensorflow:Layer lstm_9 will not use cuDNN kernels since it doesn't meet the criteria. It will use a generic GPU kernel as fallback when running on GPU.
/var/folders/9d/r4wkb8dj54b5k6_vd_8stbq00000gn/T/ipykernel_35776/3336542835.py:17: FutureWarning: DataFrame.fillna with 'method' is deprecated and will raise in a future version. Use obj.ffill() or obj.bfill() instead. df_user = df_user[features].fillna(method='ffill') WARNING:tensorflow:Layer lstm_9 will not use cuDNN kernels since it doesn't meet the criteria. It will use a generic GPU kernel as fallback when running on GPU. WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.Adam` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.Adam`.
Epoch 1/2
2024-02-12 10:47:55.092965: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:961] model_pruner failed: INVALID_ARGUMENT: Graph does not contain terminal node Adam/AssignAddVariableOp_6.
4922/4922 [==============================] - 169s 34ms/step - loss: 0.7789 Epoch 2/2 4922/4922 [==============================] - 167s 34ms/step - loss: 0.7444
<keras.src.callbacks.History at 0x325e19750>
df_test = X_with_users[X_with_users['user_id'] == users[3]]
# Predict and fill missing values function
def predict_and_fill_missing_values(data, model, feature_columns, look_back):
for i in range(len(data)):
if pd.isnull(df.loc[i, feature_columns]).any(): # Check if any feature value is missing
input_seq = data[feature_columns].iloc[max(i-look_back, 0):i].values
input_seq = scaler.transform(input_seq) # Normalize the input
input_seq = input_seq.reshape((1, look_back, len(feature_columns)))
predicted_value = model.predict(input_seq)
data.loc[i, feature_columns] = scaler.inverse_transform(predicted_value) # Fill with prediction
return data
# Fill missing values in the original DataFrame
df_filled_with_predictions = predict_and_fill_missing_values(df_test, model, features, look_back)
for column in features:
# Find indices with missing values for the current column
missing_indices = df_test[df_test[column].isnull()].index.tolist()
for missing_index in missing_indices:
# Check if there are enough previous data points
if missing_index >= look_back:
# Prepare the input sequence for prediction
# Assuming all features in 'features' list are used for prediction
input_sequence = df_test[features].iloc[missing_index-look_back:missing_index].values
input_sequence = scaler.transform(input_sequence) # Scale the sequence according to previous scaler fit
input_sequence = input_sequence.reshape((1, look_back, len(features)))
# Predict the missing value
predicted_value = model.predict(input_sequence)
predicted_value = scaler.inverse_transform(predicted_value) # Assuming the model predicts the target column
# Update the DataFrame with the predicted value
# Here, we handle single-feature prediction; if predicting multiple, adjust accordingly
df_test.at[missing_index, column] = predicted_value[0, 0] # Adjust indexing based on your prediction shape
y = combined_csv_data[output_columns]
def missing_value_check(df, df_name = "BrokenPipeError"):
missing_values = df.isna().sum()
missing_values = missing_values[missing_values > 0]
if len(missing_values) > 0:
plt.figure(figsize=(15, 60))
missing_values.sort_values(ascending=True).plot(kind='barh')
plt.title(f'Missing Values in Each Column ({df_name})')
plt.xlabel('Columns')
plt.ylabel('Number of Missing Values')
plt.show()
else:
print('All the missing values have been covered.')
missing_value_check(X, 'X')
X.fillna(-1, inplace=True)
#TODO: Need to find the best way to add the missing values
/var/folders/9d/r4wkb8dj54b5k6_vd_8stbq00000gn/T/ipykernel_4080/2398456764.py:2: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy X.fillna(-1, inplace=True)
missing_value_check(y, 'y')
y.fillna(0, inplace=True)
missing_value_check(y, 'y')
All the missing values have been covered.
/var/folders/9d/r4wkb8dj54b5k6_vd_8stbq00000gn/T/ipykernel_4080/1312125286.py:2: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy y.fillna(0, inplace=True)
Model Testing¶
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
import tensorflow as tf
model = tf.keras.Sequential([
tf.keras.Input(shape=(len(input_columns),)),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(len(output_columns))]
)
model.compile(optimizer='adam', loss='mse')
epochs = 3
batch_size = 32
history = model.fit(X_train, y_train, epochs=epochs, batch_size=batch_size, validation_split=0.2)
Epoch 1/3 514/514 [==============================] - 10s 18ms/step - loss: 265207989403648.0000 - val_loss: 16220.5537 Epoch 2/3 514/514 [==============================] - 2s 5ms/step - loss: 16198.3203 - val_loss: 15281.2402 Epoch 3/3 514/514 [==============================] - 2s 5ms/step - loss: 493791648.0000 - val_loss: 82575944.0000
# Save the entire model to a file
model.save("tf_model_v2.h5")
/Users/zaina/miniconda3/lib/python3.10/site-packages/keras/src/engine/training.py:3103: UserWarning: You are saving your model as an HDF5 file via `model.save()`. This file format is considered legacy. We recommend using instead the native Keras format, e.g. `model.save('my_model.keras')`.
saving_api.save_model(
print(y_test[1:3])
y_pred_0 = model.predict(X_test[1:3])
print(y_pred_0)
label:LYING_DOWN label:SITTING label:FIX_walking label:FIX_running \
4209 1.0 0.0 0.0 0.0
3455 0.0 0.0 0.0 0.0
label:BICYCLING label:SLEEPING label:LAB_WORK label:IN_CLASS \
4209 0.0 1.0 0.0 0.0
3455 1.0 0.0 0.0 0.0
label:IN_A_MEETING label:LOC_main_workplace label:OR_indoors \
4209 0.0 0.0 1.0
3455 0.0 0.0 0.0
label:OR_outside label:IN_A_CAR label:ON_A_BUS \
4209 0.0 0.0 0.0
3455 0.0 0.0 0.0
label:DRIVE_-_I_M_THE_DRIVER label:DRIVE_-_I_M_A_PASSENGER \
4209 0.0 0.0
3455 0.0 0.0
label:LOC_home label:FIX_restaurant label:PHONE_IN_POCKET \
4209 0.0 0.0 0.0
3455 0.0 0.0 0.0
label:OR_exercise label:COOKING label:SHOPPING label:STROLLING \
4209 0.0 0.0 0.0 0.0
3455 1.0 0.0 0.0 0.0
label:DRINKING__ALCOHOL_ label:BATHING_-_SHOWER label:CLEANING \
4209 0.0 0.0 0.0
3455 0.0 0.0 0.0
label:DOING_LAUNDRY label:WASHING_DISHES label:WATCHING_TV \
4209 0.0 0.0 0.0
3455 0.0 0.0 0.0
label:SURFING_THE_INTERNET label:AT_A_PARTY label:AT_A_BAR \
4209 0.0 0.0 0.0
3455 0.0 0.0 0.0
label:LOC_beach label:SINGING label:TALKING label:COMPUTER_WORK \
4209 0.0 0.0 0.0 0.0
3455 0.0 0.0 0.0 0.0
label:EATING label:TOILET label:GROOMING label:DRESSING \
4209 0.0 0.0 0.0 0.0
3455 0.0 0.0 0.0 0.0
label:AT_THE_GYM label:STAIRS_-_GOING_UP label:STAIRS_-_GOING_DOWN \
4209 0.0 0.0 0.0
3455 0.0 0.0 0.0
label:ELEVATOR label:OR_standing label:AT_SCHOOL label:PHONE_IN_HAND \
4209 0.0 0.0 0.0 0.0
3455 0.0 0.0 0.0 0.0
label:PHONE_IN_BAG label:PHONE_ON_TABLE label:WITH_CO-WORKERS \
4209 0.0 0.0 0.0
3455 0.0 0.0 0.0
label:WITH_FRIENDS
4209 0.0
3455 1.0
1/1 [==============================] - 0s 46ms/step
[[-2.59731178e+01 1.27983383e+02 9.80889738e-01 -1.23986526e+02
-2.96019501e+02 8.20305176e+01 1.28011810e+02 3.19692898e+01
6.39938889e+01 -1.47975021e+02 4.20986694e+02 2.19858627e+01
-2.10005844e+02 1.11979340e+02 2.07982361e+02 5.60200195e+01
-7.19873886e+01 3.59612007e+01 3.61966370e+02 3.60004822e+02
-6.64010315e+02 5.60046501e+01 -3.36010376e+02 2.23978058e+02
7.60032578e+01 -6.50400039e+04 2.27999527e+02 -9.59694061e+01
1.32010544e+02 2.92007446e+02 1.59852057e+01 1.11990677e+02
-8.99841309e+01 -1.27992393e+02 -1.84007034e+02 2.01966812e+02
-1.47997574e+02 6.40029984e+01 -2.02013046e+02 4.00207596e+01
-3.12010712e+02 1.19813356e+01 1.11974045e+02 2.47998413e+02
-2.23997925e+02 1.36005981e+02 -1.56002060e+02 8.00049820e+01
2.00016113e+02 5.24007385e+02 8.80009766e+01]
[-2.19731178e+01 1.31983383e+02 1.49808893e+01 -1.13986526e+02
-3.06019501e+02 7.00305176e+01 1.28011810e+02 7.96929073e+00
8.79938889e+01 -1.59975021e+02 4.32986694e+02 2.59858627e+01
-2.18005844e+02 1.31979340e+02 1.79982361e+02 6.40200195e+01
-7.99873886e+01 3.99612007e+01 3.43966370e+02 3.62004822e+02
-6.38010315e+02 7.20046463e+01 -3.24010376e+02 2.15978058e+02
6.40032578e+01 -6.50140039e+04 2.43999527e+02 -9.19694061e+01
1.50010544e+02 3.00007446e+02 -6.01479435e+00 1.00990677e+02
-1.09984131e+02 -1.35992401e+02 -1.76007034e+02 2.07966812e+02
-1.23997574e+02 7.20029984e+01 -1.86013046e+02 2.07594633e-02
-3.36010712e+02 3.09813347e+01 9.59740448e+01 2.31998413e+02
-2.23997925e+02 1.56005981e+02 -1.60002060e+02 8.40049820e+01
1.96016113e+02 5.26007385e+02 7.60009766e+01]]
loss = model.evaluate(X_test, y_test)
print(f"Test Loss: {loss:.4f}")
# Predictions
y_pred = model.predict(X_test)
276/276 [==============================] - 1s 4ms/step - loss: 82582624.0000 Test Loss: 82582624.0000 276/276 [==============================] - 1s 2ms/step
print("X_test shape:", X_test.shape)
print("y_test shape:", y_test.shape)
X_test shape: (8806, 226) y_test shape: (8806, 51)
# Extracting loss and validation loss values
training_loss = history.history['loss']
validation_loss = history.history['val_loss']
# Creating epoch numbers (starting from 1)
epochs_range = range(1, epochs + 1)
# Plotting the training and validation loss
plt.figure(figsize=(8, 4))
plt.plot(epochs_range, training_loss, 'bo-', label='Training Loss')
plt.plot(epochs_range, validation_loss, 'ro-', label='Validation Loss')
plt.title('Training and Validation Loss per Epoch')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.show()
!pip3 freeze > requirements.txt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import make_pipeline
from sklearn.impute import SimpleImputer
import pandas as pd
import numpy as np
# Assuming combined_csv_data is your DataFrame and it has been loaded already
# Before splitting, ensure there are no NaN values in your output columns
for col in output_columns:
combined_csv_data[col].fillna(0, inplace=True) # Replace NaN in y with 0, if appropriate
# Splitting the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(combined_csv_data[input_columns], combined_csv_data[output_columns], test_size=0.2, random_state=42)
predictions = {}
for output_col in output_columns:
# Create a pipeline with an imputer (to fill missing values in features) and logistic regression
pipeline = make_pipeline(
SimpleImputer(strategy='mean'), # Fills missing X values with the mean of each column
LogisticRegression(max_iter=1000) # Increased max_iter to ensure convergence
)
# Fit the pipeline to the training data
pipeline.fit(X_train, y_train[output_col])
# New data for prediction. This example is simplified and should be replaced with actual new data.
# Ensure X_new has the same number of features as X_train. Here, we use np.nan as placeholders.
X_new = np.array([[0.5, 1.2] + [np.nan] * (224)]) # Adjusted to match the feature count of the trained model
# Predicting the probability for the given X_new
pred_prob = pipeline.predict_proba(X_new)[0][1]
# Storing the prediction
predictions[output_col] = pred_prob
# Displaying the predicted probabilities
for y_col, prob in predictions.items():
print(f"Predicted probability for {y_col}: {prob:.2%}")
/var/folders/9d/r4wkb8dj54b5k6_vd_8stbq00000gn/T/ipykernel_4080/3046016783.py:14: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.
combined_csv_data[col].fillna(0, inplace=True) # Replace NaN in y with 0, if appropriate
/Users/zaina/miniconda3/lib/python3.10/site-packages/sklearn/impute/_base.py:565: UserWarning: Skipping features without any observed values: ['location:min_altitude' 'location:max_altitude'
'location:best_vertical_accuracy' 'lf_measurements:proximity'
'lf_measurements:screen_brightness']. At least one non-missing value is needed for imputation with strategy='mean'.
warnings.warn(
/Users/zaina/miniconda3/lib/python3.10/site-packages/sklearn/base.py:493: UserWarning: X does not have valid feature names, but SimpleImputer was fitted with feature names
warnings.warn(
/Users/zaina/miniconda3/lib/python3.10/site-packages/sklearn/impute/_base.py:565: UserWarning: Skipping features without any observed values: ['location:min_altitude' 'location:max_altitude'
'location:best_vertical_accuracy' 'lf_measurements:proximity'
'lf_measurements:screen_brightness']. At least one non-missing value is needed for imputation with strategy='mean'.
warnings.warn(
/Users/zaina/miniconda3/lib/python3.10/site-packages/sklearn/impute/_base.py:565: UserWarning: Skipping features without any observed values: ['location:min_altitude' 'location:max_altitude'
'location:best_vertical_accuracy' 'lf_measurements:proximity'
'lf_measurements:screen_brightness']. At least one non-missing value is needed for imputation with strategy='mean'.
warnings.warn(
/Users/zaina/miniconda3/lib/python3.10/site-packages/sklearn/base.py:493: UserWarning: X does not have valid feature names, but SimpleImputer was fitted with feature names
warnings.warn(
/Users/zaina/miniconda3/lib/python3.10/site-packages/sklearn/impute/_base.py:565: UserWarning: Skipping features without any observed values: ['location:min_altitude' 'location:max_altitude'
'location:best_vertical_accuracy' 'lf_measurements:proximity'
'lf_measurements:screen_brightness']. At least one non-missing value is needed for imputation with strategy='mean'.
warnings.warn(
/Users/zaina/miniconda3/lib/python3.10/site-packages/sklearn/impute/_base.py:565: UserWarning: Skipping features without any observed values: ['location:min_altitude' 'location:max_altitude'
'location:best_vertical_accuracy' 'lf_measurements:proximity'
'lf_measurements:screen_brightness']. At least one non-missing value is needed for imputation with strategy='mean'.
warnings.warn(
/Users/zaina/miniconda3/lib/python3.10/site-packages/sklearn/base.py:493: UserWarning: X does not have valid feature names, but SimpleImputer was fitted with feature names
warnings.warn(
/Users/zaina/miniconda3/lib/python3.10/site-packages/sklearn/impute/_base.py:565: UserWarning: Skipping features without any observed values: ['location:min_altitude' 'location:max_altitude'
'location:best_vertical_accuracy' 'lf_measurements:proximity'
'lf_measurements:screen_brightness']. At least one non-missing value is needed for imputation with strategy='mean'.
warnings.warn(
/Users/zaina/miniconda3/lib/python3.10/site-packages/sklearn/impute/_base.py:565: UserWarning: Skipping features without any observed values: ['location:min_altitude' 'location:max_altitude'
'location:best_vertical_accuracy' 'lf_measurements:proximity'
'lf_measurements:screen_brightness']. At least one non-missing value is needed for imputation with strategy='mean'.
warnings.warn(
/Users/zaina/miniconda3/lib/python3.10/site-packages/sklearn/base.py:493: UserWarning: X does not have valid feature names, but SimpleImputer was fitted with feature names
warnings.warn(
/Users/zaina/miniconda3/lib/python3.10/site-packages/sklearn/impute/_base.py:565: UserWarning: Skipping features without any observed values: ['location:min_altitude' 'location:max_altitude'
'location:best_vertical_accuracy' 'lf_measurements:proximity'
'lf_measurements:screen_brightness']. At least one non-missing value is needed for imputation with strategy='mean'.
warnings.warn(
/Users/zaina/miniconda3/lib/python3.10/site-packages/sklearn/impute/_base.py:565: UserWarning: Skipping features without any observed values: ['location:min_altitude' 'location:max_altitude'
'location:best_vertical_accuracy' 'lf_measurements:proximity'
'lf_measurements:screen_brightness']. At least one non-missing value is needed for imputation with strategy='mean'.
warnings.warn(
/Users/zaina/miniconda3/lib/python3.10/site-packages/sklearn/base.py:493: UserWarning: X does not have valid feature names, but SimpleImputer was fitted with feature names
warnings.warn(
/Users/zaina/miniconda3/lib/python3.10/site-packages/sklearn/impute/_base.py:565: UserWarning: Skipping features without any observed values: ['location:min_altitude' 'location:max_altitude'
'location:best_vertical_accuracy' 'lf_measurements:proximity'
'lf_measurements:screen_brightness']. At least one non-missing value is needed for imputation with strategy='mean'.
warnings.warn(
/Users/zaina/miniconda3/lib/python3.10/site-packages/sklearn/impute/_base.py:565: UserWarning: Skipping features without any observed values: ['location:min_altitude' 'location:max_altitude'
'location:best_vertical_accuracy' 'lf_measurements:proximity'
'lf_measurements:screen_brightness']. At least one non-missing value is needed for imputation with strategy='mean'.
warnings.warn(
/Users/zaina/miniconda3/lib/python3.10/site-packages/sklearn/base.py:493: UserWarning: X does not have valid feature names, but SimpleImputer was fitted with feature names
warnings.warn(
/Users/zaina/miniconda3/lib/python3.10/site-packages/sklearn/impute/_base.py:565: UserWarning: Skipping features without any observed values: ['location:min_altitude' 'location:max_altitude'
'location:best_vertical_accuracy' 'lf_measurements:proximity'
'lf_measurements:screen_brightness']. At least one non-missing value is needed for imputation with strategy='mean'.
warnings.warn(
/Users/zaina/miniconda3/lib/python3.10/site-packages/sklearn/impute/_base.py:565: UserWarning: Skipping features without any observed values: ['location:min_altitude' 'location:max_altitude'
'location:best_vertical_accuracy' 'lf_measurements:proximity'
'lf_measurements:screen_brightness']. At least one non-missing value is needed for imputation with strategy='mean'.
warnings.warn(
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) Cell In[26], line 29 23 pipeline = make_pipeline( 24 SimpleImputer(strategy='mean'), # Fills missing X values with the mean of each column 25 LogisticRegression(max_iter=1000) # Increased max_iter to ensure convergence 26 ) 28 # Fit the pipeline to the training data ---> 29 pipeline.fit(X_train, y_train[output_col]) 31 # New data for prediction. This example is simplified and should be replaced with actual new data. 32 # Ensure X_new has the same number of features as X_train. Here, we use np.nan as placeholders. 33 X_new = np.array([[0.5, 1.2] + [np.nan] * (224)]) # Adjusted to match the feature count of the trained model File ~/miniconda3/lib/python3.10/site-packages/sklearn/base.py:1351, in _fit_context.<locals>.decorator.<locals>.wrapper(estimator, *args, **kwargs) 1344 estimator._validate_params() 1346 with config_context( 1347 skip_parameter_validation=( 1348 prefer_skip_nested_validation or global_skip_validation 1349 ) 1350 ): -> 1351 return fit_method(estimator, *args, **kwargs) File ~/miniconda3/lib/python3.10/site-packages/sklearn/pipeline.py:475, in Pipeline.fit(self, X, y, **params) 473 if self._final_estimator != "passthrough": 474 last_step_params = routed_params[self.steps[-1][0]] --> 475 self._final_estimator.fit(Xt, y, **last_step_params["fit"]) 477 return self File ~/miniconda3/lib/python3.10/site-packages/sklearn/base.py:1351, in _fit_context.<locals>.decorator.<locals>.wrapper(estimator, *args, **kwargs) 1344 estimator._validate_params() 1346 with config_context( 1347 skip_parameter_validation=( 1348 prefer_skip_nested_validation or global_skip_validation 1349 ) 1350 ): -> 1351 return fit_method(estimator, *args, **kwargs) File ~/miniconda3/lib/python3.10/site-packages/sklearn/linear_model/_logistic.py:1246, in LogisticRegression.fit(self, X, y, sample_weight) 1244 classes_ = self.classes_ 1245 if n_classes < 2: -> 1246 raise ValueError( 1247 "This solver needs samples of at least 2 classes" 1248 " in the data, but the data contains only one" 1249 " class: %r" 1250 % classes_[0] 1251 ) 1253 if len(self.classes_) == 2: 1254 n_classes = 1 ValueError: This solver needs samples of at least 2 classes in the data, but the data contains only one class: 0.0
len(predictions)
51
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import make_pipeline
from sklearn.impute import SimpleImputer
import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings("ignore", message="X does not have valid feature names, but SimpleImputer was fitted with feature names")
# Assuming combined_csv_data is your DataFrame and it has been loaded already
# Replace 'combined_csv_data' with the actual variable name of your DataFrame
# Fill missing values in output columns with a default value (e.g., 0)
y.fillna(0, inplace=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Dictionary to store the pipeline for each output column
pipelines = {}
# Train a pipeline for each output column
for output_col in output_columns:
pipeline = make_pipeline(
SimpleImputer(strategy='mean'), # Impute missing values
LogisticRegression(max_iter=1000) # Logistic regression
)
# Fit the pipeline on the training data
pipeline.fit(X_train, y_train[output_col])
pipelines[output_col] = pipeline
# Predicting the probabilities for each row in X_test
predictions = {col: [] for col in output_columns} # Initialize dictionary to store predictions
for index, row in X_test.iterrows():
for output_col in output_columns:
# Predict the probability for the current row and output column
pred_prob = pipelines[output_col].predict_proba(row.values.reshape(1, -1))[0][1]
predictions[output_col].append(pred_prob)
# Optionally, print out the predicted probabilities for the first few rows of X_test
for i, (index, row) in enumerate(X_test.iterrows()):
if i >= 5: # Limit output to first 5 rows
break
print(f"Predictions for row {index}:")
for output_col in output_columns:
print(f" {output_col}: {predictions[output_col][i]:.2%}")
print() # Newline for readability
/var/folders/9d/r4wkb8dj54b5k6_vd_8stbq00000gn/T/ipykernel_43744/1352247473.py:16: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy y.fillna(0, inplace=True)
Predictions for row 225548: label:LYING_DOWN: 45.00% label:SITTING: 36.16% label:FIX_walking: 2.10% label:FIX_running: 0.29% label:BICYCLING: 0.49% label:SLEEPING: 22.02% label:LAB_WORK: 0.57% label:IN_CLASS: 1.63% label:IN_A_MEETING: 1.36% label:LOC_main_workplace: 9.00% label:OR_indoors: 63.17% label:OR_outside: 3.21% label:IN_A_CAR: 0.65% label:ON_A_BUS: 0.47% label:DRIVE_-_I_M_THE_DRIVER: 2.12% label:DRIVE_-_I_M_A_PASSENGER: 0.67% label:LOC_home: 49.30% label:FIX_restaurant: 0.55% label:PHONE_IN_POCKET: 6.24% label:OR_exercise: 0.83% label:COOKING: 1.06% label:SHOPPING: 0.49% label:STROLLING: 0.21% label:DRINKING__ALCOHOL_: 0.39% label:BATHING_-_SHOWER: 0.55% label:CLEANING: 1.02% label:DOING_LAUNDRY: 0.15% label:WASHING_DISHES: 0.33% label:WATCHING_TV: 3.51% label:SURFING_THE_INTERNET: 5.18% label:AT_A_PARTY: 0.39% label:AT_A_BAR: 0.15% label:LOC_beach: 0.09% label:SINGING: 0.17% label:TALKING: 9.61% label:COMPUTER_WORK: 8.82% label:EATING: 2.66% label:TOILET: 0.56% label:GROOMING: 0.80% label:DRESSING: 0.59% label:AT_THE_GYM: 0.09% label:STAIRS_-_GOING_UP: 0.06% label:STAIRS_-_GOING_DOWN: 0.20% label:ELEVATOR: 0.05% label:OR_standing: 10.00% label:AT_SCHOOL: 4.26% label:PHONE_IN_HAND: 3.87% label:PHONE_IN_BAG: 1.31% label:PHONE_ON_TABLE: 30.48% label:WITH_CO-WORKERS: 1.65% label:WITH_FRIENDS: 3.26% Predictions for row 22229: label:LYING_DOWN: 45.50% label:SITTING: 36.13% label:FIX_walking: 4.86% label:FIX_running: 0.28% label:BICYCLING: 0.11% label:SLEEPING: 21.97% label:LAB_WORK: 0.26% label:IN_CLASS: 1.62% label:IN_A_MEETING: 1.35% label:LOC_main_workplace: 8.96% label:OR_indoors: 64.53% label:OR_outside: 3.19% label:IN_A_CAR: 0.97% label:ON_A_BUS: 0.47% label:DRIVE_-_I_M_THE_DRIVER: 2.10% label:DRIVE_-_I_M_A_PASSENGER: 0.66% label:LOC_home: 50.70% label:FIX_restaurant: 0.54% label:PHONE_IN_POCKET: 6.20% label:OR_exercise: 0.21% label:COOKING: 1.05% label:SHOPPING: 0.48% label:STROLLING: 0.21% label:DRINKING__ALCOHOL_: 0.39% label:BATHING_-_SHOWER: 0.54% label:CLEANING: 1.00% label:DOING_LAUNDRY: 0.14% label:WASHING_DISHES: 0.33% label:WATCHING_TV: 3.48% label:SURFING_THE_INTERNET: 5.15% label:AT_A_PARTY: 0.39% label:AT_A_BAR: 0.15% label:LOC_beach: 0.09% label:SINGING: 0.17% label:TALKING: 9.56% label:COMPUTER_WORK: 2.17% label:EATING: 5.67% label:TOILET: 1.02% label:GROOMING: 0.79% label:DRESSING: 0.58% label:AT_THE_GYM: 0.10% label:STAIRS_-_GOING_UP: 0.10% label:STAIRS_-_GOING_DOWN: 0.19% label:ELEVATOR: 0.05% label:OR_standing: 9.95% label:AT_SCHOOL: 4.24% label:PHONE_IN_HAND: 3.84% label:PHONE_IN_BAG: 4.32% label:PHONE_ON_TABLE: 30.44% label:WITH_CO-WORKERS: 1.63% label:WITH_FRIENDS: 7.56% Predictions for row 257345: label:LYING_DOWN: 31.26% label:SITTING: 36.12% label:FIX_walking: 4.04% label:FIX_running: 0.28% label:BICYCLING: 0.57% label:SLEEPING: 21.95% label:LAB_WORK: 0.48% label:IN_CLASS: 1.61% label:IN_A_MEETING: 1.34% label:LOC_main_workplace: 8.94% label:OR_indoors: 53.19% label:OR_outside: 3.18% label:IN_A_CAR: 0.30% label:ON_A_BUS: 0.46% label:DRIVE_-_I_M_THE_DRIVER: 2.09% label:DRIVE_-_I_M_A_PASSENGER: 0.66% label:LOC_home: 48.89% label:FIX_restaurant: 0.54% label:PHONE_IN_POCKET: 6.19% label:OR_exercise: 1.68% label:COOKING: 1.04% label:SHOPPING: 0.48% label:STROLLING: 0.21% label:DRINKING__ALCOHOL_: 0.38% label:BATHING_-_SHOWER: 0.54% label:CLEANING: 1.00% label:DOING_LAUNDRY: 0.14% label:WASHING_DISHES: 0.32% label:WATCHING_TV: 3.47% label:SURFING_THE_INTERNET: 5.13% label:AT_A_PARTY: 0.38% label:AT_A_BAR: 0.15% label:LOC_beach: 0.21% label:SINGING: 0.17% label:TALKING: 9.54% label:COMPUTER_WORK: 6.79% label:EATING: 3.83% label:TOILET: 0.51% label:GROOMING: 0.79% label:DRESSING: 0.58% label:AT_THE_GYM: 0.36% label:STAIRS_-_GOING_UP: 0.27% label:STAIRS_-_GOING_DOWN: 0.19% label:ELEVATOR: 0.05% label:OR_standing: 9.94% label:AT_SCHOOL: 19.61% label:PHONE_IN_HAND: 3.83% label:PHONE_IN_BAG: 2.26% label:PHONE_ON_TABLE: 30.43% label:WITH_CO-WORKERS: 1.63% label:WITH_FRIENDS: 8.64% Predictions for row 291119: label:LYING_DOWN: 27.18% label:SITTING: 36.20% label:FIX_walking: 4.28% label:FIX_running: 0.29% label:BICYCLING: 0.63% label:SLEEPING: 22.09% label:LAB_WORK: 0.81% label:IN_CLASS: 1.65% label:IN_A_MEETING: 1.38% label:LOC_main_workplace: 9.06% label:OR_indoors: 50.69% label:OR_outside: 3.24% label:IN_A_CAR: 0.63% label:ON_A_BUS: 0.48% label:DRIVE_-_I_M_THE_DRIVER: 2.14% label:DRIVE_-_I_M_A_PASSENGER: 0.68% label:LOC_home: 43.66% label:FIX_restaurant: 0.55% label:PHONE_IN_POCKET: 6.28% label:OR_exercise: 1.48% label:COOKING: 1.07% label:SHOPPING: 0.50% label:STROLLING: 0.22% label:DRINKING__ALCOHOL_: 0.40% label:BATHING_-_SHOWER: 0.56% label:CLEANING: 1.03% label:DOING_LAUNDRY: 0.15% label:WASHING_DISHES: 0.34% label:WATCHING_TV: 3.54% label:SURFING_THE_INTERNET: 5.22% label:AT_A_PARTY: 0.40% label:AT_A_BAR: 0.15% label:LOC_beach: 0.16% label:SINGING: 0.18% label:TALKING: 9.67% label:COMPUTER_WORK: 14.49% label:EATING: 4.10% label:TOILET: 0.61% label:GROOMING: 0.81% label:DRESSING: 0.60% label:AT_THE_GYM: 0.27% label:STAIRS_-_GOING_UP: 0.20% label:STAIRS_-_GOING_DOWN: 0.20% label:ELEVATOR: 0.05% label:OR_standing: 10.06% label:AT_SCHOOL: 13.05% label:PHONE_IN_HAND: 3.90% label:PHONE_IN_BAG: 2.36% label:PHONE_ON_TABLE: 30.54% label:WITH_CO-WORKERS: 1.67% label:WITH_FRIENDS: 6.86% Predictions for row 47990: label:LYING_DOWN: 29.81% label:SITTING: 36.18% label:FIX_walking: 5.65% label:FIX_running: 0.29% label:BICYCLING: 1.62% label:SLEEPING: 22.05% label:LAB_WORK: 1.07% label:IN_CLASS: 1.64% label:IN_A_MEETING: 1.37% label:LOC_main_workplace: 9.03% label:OR_indoors: 49.20% label:OR_outside: 3.23% label:IN_A_CAR: 0.79% label:ON_A_BUS: 0.47% label:DRIVE_-_I_M_THE_DRIVER: 2.13% label:DRIVE_-_I_M_A_PASSENGER: 0.67% label:LOC_home: 38.02% label:FIX_restaurant: 0.55% label:PHONE_IN_POCKET: 6.26% label:OR_exercise: 1.76% label:COOKING: 1.06% label:SHOPPING: 0.49% label:STROLLING: 0.22% label:DRINKING__ALCOHOL_: 0.39% label:BATHING_-_SHOWER: 0.55% label:CLEANING: 1.02% label:DOING_LAUNDRY: 0.15% label:WASHING_DISHES: 0.33% label:WATCHING_TV: 3.52% label:SURFING_THE_INTERNET: 5.20% label:AT_A_PARTY: 0.39% label:AT_A_BAR: 0.15% label:LOC_beach: 0.11% label:SINGING: 0.18% label:TALKING: 9.63% label:COMPUTER_WORK: 9.80% label:EATING: 3.08% label:TOILET: 0.81% label:GROOMING: 0.81% label:DRESSING: 0.59% label:AT_THE_GYM: 0.19% label:STAIRS_-_GOING_UP: 0.13% label:STAIRS_-_GOING_DOWN: 0.20% label:ELEVATOR: 0.05% label:OR_standing: 10.02% label:AT_SCHOOL: 4.95% label:PHONE_IN_HAND: 3.88% label:PHONE_IN_BAG: 1.61% label:PHONE_ON_TABLE: 30.51% label:WITH_CO-WORKERS: 1.66% label:WITH_FRIENDS: 4.48%
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import make_pipeline
from sklearn.impute import SimpleImputer
import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings("ignore", message="X does not have valid feature names, but SimpleImputer was fitted with feature names")
# Assuming combined_csv_data is your DataFrame and it has been loaded already
# Replace 'combined_csv_data' with the actual variable name of your DataFrame
# Fill missing values in output columns with a default value (e.g., 0)
y.fillna(0, inplace=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Dictionary to store the pipeline for each output column
pipelines = {}
# Train a pipeline for each output column
for output_col in output_columns:
pipeline = make_pipeline(
SimpleImputer(strategy='mean'), # Impute missing values
LogisticRegression(max_iter=1000) # Logistic regression
)
# Fit the pipeline on the training data
pipeline.fit(X_train, y_train[output_col])
pipelines[output_col] = pipeline
# Predicting the probabilities for each row in X_test
predictions = {col: [] for col in output_columns} # Initialize dictionary to store predictions
for index, row in X_test.iterrows():
for output_col in output_columns:
# Predict the probability for the current row and output column
pred_prob = pipelines[output_col].predict_proba(row.values.reshape(1, -1))[0][1]
predictions[output_col].append(pred_prob)
# Optionally, print out the predicted probabilities for the first few rows of X_test
for i, (index, row) in enumerate(X_test.iterrows()):
if i >= 5: # Limit output to first 5 rows
break
print(f"Predictions for row {index}:")
for output_col in output_columns:
print(f" {output_col}: {predictions[output_col][i]:.2%}")
print() # Newline for readability
/var/folders/9d/r4wkb8dj54b5k6_vd_8stbq00000gn/T/ipykernel_43744/1352247473.py:16: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy y.fillna(0, inplace=True)
--------------------------------------------------------------------------- KeyboardInterrupt Traceback (most recent call last) Cell In[23], line 30 25 pipeline = make_pipeline( 26 SimpleImputer(strategy='mean'), # Impute missing values 27 LogisticRegression(max_iter=1000) # Logistic regression 28 ) 29 # Fit the pipeline on the training data ---> 30 pipeline.fit(X_train, y_train[output_col]) 31 pipelines[output_col] = pipeline 33 # Predicting the probabilities for each row in X_test File ~/miniconda3/lib/python3.10/site-packages/sklearn/base.py:1351, in _fit_context.<locals>.decorator.<locals>.wrapper(estimator, *args, **kwargs) 1344 estimator._validate_params() 1346 with config_context( 1347 skip_parameter_validation=( 1348 prefer_skip_nested_validation or global_skip_validation 1349 ) 1350 ): -> 1351 return fit_method(estimator, *args, **kwargs) File ~/miniconda3/lib/python3.10/site-packages/sklearn/pipeline.py:471, in Pipeline.fit(self, X, y, **params) 428 """Fit the model. 429 430 Fit all the transformers one after the other and sequentially transform the (...) 468 Pipeline with fitted steps. 469 """ 470 routed_params = self._check_method_params(method="fit", props=params) --> 471 Xt = self._fit(X, y, routed_params) 472 with _print_elapsed_time("Pipeline", self._log_message(len(self.steps) - 1)): 473 if self._final_estimator != "passthrough": File ~/miniconda3/lib/python3.10/site-packages/sklearn/pipeline.py:408, in Pipeline._fit(self, X, y, routed_params) 406 cloned_transformer = clone(transformer) 407 # Fit or load from cache the current transformer --> 408 X, fitted_transformer = fit_transform_one_cached( 409 cloned_transformer, 410 X, 411 y, 412 None, 413 message_clsname="Pipeline", 414 message=self._log_message(step_idx), 415 params=routed_params[name], 416 ) 417 # Replace the transformer of the step with the fitted 418 # transformer. This is necessary when loading the transformer 419 # from the cache. 420 self.steps[step_idx] = (name, fitted_transformer) File ~/miniconda3/lib/python3.10/site-packages/joblib/memory.py:353, in NotMemorizedFunc.__call__(self, *args, **kwargs) 352 def __call__(self, *args, **kwargs): --> 353 return self.func(*args, **kwargs) File ~/miniconda3/lib/python3.10/site-packages/sklearn/pipeline.py:1303, in _fit_transform_one(transformer, X, y, weight, message_clsname, message, params) 1301 with _print_elapsed_time(message_clsname, message): 1302 if hasattr(transformer, "fit_transform"): -> 1303 res = transformer.fit_transform(X, y, **params.get("fit_transform", {})) 1304 else: 1305 res = transformer.fit(X, y, **params.get("fit", {})).transform( 1306 X, **params.get("transform", {}) 1307 ) File ~/miniconda3/lib/python3.10/site-packages/sklearn/utils/_set_output.py:273, in _wrap_method_output.<locals>.wrapped(self, X, *args, **kwargs) 271 @wraps(f) 272 def wrapped(self, X, *args, **kwargs): --> 273 data_to_wrap = f(self, X, *args, **kwargs) 274 if isinstance(data_to_wrap, tuple): 275 # only wrap the first output for cross decomposition 276 return_tuple = ( 277 _wrap_data_with_container(method, data_to_wrap[0], X, self), 278 *data_to_wrap[1:], 279 ) File ~/miniconda3/lib/python3.10/site-packages/sklearn/base.py:1064, in TransformerMixin.fit_transform(self, X, y, **fit_params) 1061 return self.fit(X, **fit_params).transform(X) 1062 else: 1063 # fit method of arity 2 (supervised transformation) -> 1064 return self.fit(X, y, **fit_params).transform(X) File ~/miniconda3/lib/python3.10/site-packages/sklearn/utils/_set_output.py:273, in _wrap_method_output.<locals>.wrapped(self, X, *args, **kwargs) 271 @wraps(f) 272 def wrapped(self, X, *args, **kwargs): --> 273 data_to_wrap = f(self, X, *args, **kwargs) 274 if isinstance(data_to_wrap, tuple): 275 # only wrap the first output for cross decomposition 276 return_tuple = ( 277 _wrap_data_with_container(method, data_to_wrap[0], X, self), 278 *data_to_wrap[1:], 279 ) File ~/miniconda3/lib/python3.10/site-packages/sklearn/impute/_base.py:537, in SimpleImputer.transform(self, X) 522 """Impute all missing values in `X`. 523 524 Parameters (...) 533 `X` with imputed values. 534 """ 535 check_is_fitted(self) --> 537 X = self._validate_input(X, in_fit=False) 538 statistics = self.statistics_ 540 if X.shape[1] != statistics.shape[0]: File ~/miniconda3/lib/python3.10/site-packages/sklearn/impute/_base.py:322, in SimpleImputer._validate_input(self, X, in_fit) 319 force_all_finite = True 321 try: --> 322 X = self._validate_data( 323 X, 324 reset=in_fit, 325 accept_sparse="csc", 326 dtype=dtype, 327 force_all_finite=force_all_finite, 328 copy=self.copy, 329 ) 330 except ValueError as ve: 331 if "could not convert" in str(ve): File ~/miniconda3/lib/python3.10/site-packages/sklearn/base.py:633, in BaseEstimator._validate_data(self, X, y, reset, validate_separately, cast_to_ndarray, **check_params) 631 out = X, y 632 elif not no_val_X and no_val_y: --> 633 out = check_array(X, input_name="X", **check_params) 634 elif no_val_X and not no_val_y: 635 out = _check_y(y, **check_params) File ~/miniconda3/lib/python3.10/site-packages/sklearn/utils/validation.py:1013, in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator, input_name) 1010 if copy: 1011 if _is_numpy_namespace(xp): 1012 # only make a copy if `array` and `array_orig` may share memory` -> 1013 if np.may_share_memory(array, array_orig): 1014 array = _asarray_with_order( 1015 array, dtype=dtype, order=order, copy=True, xp=xp 1016 ) 1017 else: 1018 # always make a copy for non-numpy arrays File <__array_function__ internals>:180, in may_share_memory(*args, **kwargs) File ~/miniconda3/lib/python3.10/site-packages/pandas/core/generic.py:2149, in NDFrame.__array__(self, dtype) 2148 def __array__(self, dtype: npt.DTypeLike | None = None) -> np.ndarray: -> 2149 values = self._values 2150 arr = np.asarray(values, dtype=dtype) 2151 if ( 2152 astype_is_view(values.dtype, arr.dtype) 2153 and using_copy_on_write() 2154 and self._mgr.is_single_block 2155 ): 2156 # Check if both conversions can be done without a copy File ~/miniconda3/lib/python3.10/site-packages/pandas/core/frame.py:1116, in DataFrame._values(self) 1114 blocks = mgr.blocks 1115 if len(blocks) != 1: -> 1116 return ensure_wrapped_if_datetimelike(self.values) 1118 arr = blocks[0].values 1119 if arr.ndim == 1: 1120 # non-2D ExtensionArray File ~/miniconda3/lib/python3.10/site-packages/pandas/core/frame.py:12637, in DataFrame.values(self) 12563 @property 12564 def values(self) -> np.ndarray: 12565 """ 12566 Return a Numpy representation of the DataFrame. 12567 (...) 12635 ['monkey', nan, None]], dtype=object) 12636 """ > 12637 return self._mgr.as_array() File ~/miniconda3/lib/python3.10/site-packages/pandas/core/internals/managers.py:1693, in BlockManager.as_array(self, dtype, copy, na_value) 1691 arr.flags.writeable = False 1692 else: -> 1693 arr = self._interleave(dtype=dtype, na_value=na_value) 1694 # The underlying data was copied within _interleave, so no need 1695 # to further copy if copy=True or setting na_value 1697 if na_value is lib.no_default: File ~/miniconda3/lib/python3.10/site-packages/pandas/core/internals/managers.py:1753, in BlockManager._interleave(self, dtype, na_value) 1751 arr = blk.get_values(dtype) 1752 result[rl.indexer] = arr -> 1753 itemmask[rl.indexer] = 1 1755 if not itemmask.all(): 1756 raise AssertionError("Some items were not contained in blocks") KeyboardInterrupt:
combined_csv_data.head()
--------------------------------------------------------------------------- NameError Traceback (most recent call last) Cell In[2], line 1 ----> 1 combined_csv_data.head() NameError: name 'combined_csv_data' is not defined
import pandas as pd
import numpy as np
from threading import Timer
class Phone:
def __init__(self, data_df):
self.data_df = data_df # Assume data_df is a DataFrame loaded with user data
def collect_data(self, userid):
"""Collects a random data row for a given user."""
user_data = self.data_df[self.data_df['user_id'] == userid].sample(n=1)
return user_data
def process_data(self, userid, model):
"""Processes data using a specified model."""
data = self.collect_data(userid)
# Assuming `model` is a function passed to process the data
processed_data = model(data)
return processed_data
def send_data(self, userid, interval, server):
"""Periodically sends data at specified intervals."""
data = self.collect_data(userid)
server.store_update_data(data)
Timer(interval, self.send_data, args=[userid, interval, server]).start()
class Server:
def __init__(self):
self.storage_df = pd.DataFrame() # Separate DataFrame for storing data
def request_data(self, phone, userid, raw=True):
"""Requests data from the Phone class."""
if raw:
return phone.collect_data(userid)
else:
return phone.process_data(userid, self.process_data) # Example: self.process_data as a placeholder
def process_data(self, data):
"""Processes data."""
# This is a placeholder for data processing logic, which could involve ML models or other transformations
processed_data = data # Simplified for demonstration
return processed_data
def store_update_data(self, data):
"""Stores or updates data in a separate DataFrame."""
self.storage_df = pd.concat([self.storage_df, data], ignore_index=True)
data_df = pd.read_csv('ExtraSensory_Combined_User_Data.csv')
phone = Phone(data_df)
server = Server()
userid="81536B0A-8DBF-4D8A-AC24-9543E2E4C8E0"
raw_data = server.request_data(phone, userid, raw=True)
processed_data = server.request_data(phone, userid, raw=False)
server.store_update_data(raw_data)
server.store_update_data(processed_data)
# df = X and y
# Format data into our stuture after predition
# Tablaue
# User - Dynamic
# User Activity against timestamps
# Porbablity Related - Select User and Time (Log Reg) Probablities
server.storage_df
| timestamp | raw_acc:magnitude_stats:mean | raw_acc:magnitude_stats:std | raw_acc:magnitude_stats:moment3 | raw_acc:magnitude_stats:moment4 | raw_acc:magnitude_stats:percentile25 | raw_acc:magnitude_stats:percentile50 | raw_acc:magnitude_stats:percentile75 | raw_acc:magnitude_stats:value_entropy | raw_acc:magnitude_stats:time_entropy | ... | label:ELEVATOR | label:OR_standing | label:AT_SCHOOL | label:PHONE_IN_HAND | label:PHONE_IN_BAG | label:PHONE_ON_TABLE | label:WITH_CO-WORKERS | label:WITH_FRIENDS | label_source | user_id | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1446541934 | 1.020442 | 0.001224 | -0.000816 | 0.001642 | 1.019684 | 1.020508 | 1.021267 | 2.513081 | 6.684611 | ... | NaN | 0.0 | NaN | NaN | NaN | NaN | NaN | 0.0 | 2 | 81536B0A-8DBF-4D8A-AC24-9543E2E4C8E0 |
| 1 | 1446295467 | 1.035055 | 0.004175 | 0.003772 | 0.005590 | 1.032436 | 1.034281 | 1.036872 | 2.462429 | 6.684604 | ... | NaN | 0.0 | NaN | NaN | NaN | NaN | NaN | 0.0 | 2 | 81536B0A-8DBF-4D8A-AC24-9543E2E4C8E0 |
2 rows × 279 columns
data_df.head()
| timestamp | raw_acc:magnitude_stats:mean | raw_acc:magnitude_stats:std | raw_acc:magnitude_stats:moment3 | raw_acc:magnitude_stats:moment4 | raw_acc:magnitude_stats:percentile25 | raw_acc:magnitude_stats:percentile50 | raw_acc:magnitude_stats:percentile75 | raw_acc:magnitude_stats:value_entropy | raw_acc:magnitude_stats:time_entropy | ... | label:ELEVATOR | label:OR_standing | label:AT_SCHOOL | label:PHONE_IN_HAND | label:PHONE_IN_BAG | label:PHONE_ON_TABLE | label:WITH_CO-WORKERS | label:WITH_FRIENDS | label_source | user_id | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1446141691 | 1.009726 | 0.002838 | -0.002296 | 0.005568 | 1.008208 | 1.009735 | 1.011174 | 1.572784 | 6.684608 | ... | NaN | 0.0 | NaN | NaN | NaN | NaN | NaN | 0.0 | 2 | 81536B0A-8DBF-4D8A-AC24-9543E2E4C8E0 |
| 1 | 1446141752 | 1.009822 | 0.004624 | 0.003040 | 0.008459 | 1.007704 | 1.009619 | 1.011857 | 1.754729 | 6.684601 | ... | NaN | 0.0 | NaN | NaN | NaN | NaN | NaN | 0.0 | 2 | 81536B0A-8DBF-4D8A-AC24-9543E2E4C8E0 |
| 2 | 1446141805 | 1.009667 | 0.004781 | -0.007802 | 0.014457 | 1.008038 | 1.009772 | 1.011139 | 1.012852 | 6.684600 | ... | NaN | 0.0 | NaN | NaN | NaN | NaN | NaN | 0.0 | 2 | 81536B0A-8DBF-4D8A-AC24-9543E2E4C8E0 |
| 3 | 1446141873 | 1.008839 | 0.003543 | 0.001831 | 0.007082 | 1.007134 | 1.008803 | 1.010433 | 1.511878 | 6.684606 | ... | NaN | 0.0 | NaN | NaN | NaN | NaN | NaN | 0.0 | 2 | 81536B0A-8DBF-4D8A-AC24-9543E2E4C8E0 |
| 4 | 1446141925 | 1.008193 | 0.001753 | -0.000744 | 0.002439 | 1.007142 | 1.008234 | 1.009350 | 2.347186 | 6.684610 | ... | NaN | 0.0 | NaN | NaN | NaN | NaN | NaN | 0.0 | 2 | 81536B0A-8DBF-4D8A-AC24-9543E2E4C8E0 |
5 rows × 279 columns
len(combined_csv_data)
377346
nan_count_full = combined_csv_data[features].isna().sum()
nan_count_sorted_full = nan_count.sort_values(ascending=False)
nan_count_sorted_full
lf_measurements:proximity 6407 location:best_vertical_accuracy 6407 location:max_altitude 6407 location:min_altitude 6407 lf_measurements:screen_brightness 6407 watch_heading:std_sin 228 watch_heading:mean_sin 228 watch_heading:mom4_cos 228 watch_heading:mom3_cos 228 watch_heading:std_cos 228 watch_heading:mean_cos 228 watch_heading:mom3_sin 228 watch_heading:mom4_sin 228 watch_heading:entropy_8bins 228 location:max_speed 34 location:min_speed 34 watch_acceleration:magnitude_autocorrelation:period 7 watch_acceleration:magnitude_spectrum:log_energy_band3 7 watch_acceleration:magnitude_spectrum:log_energy_band4 7 watch_acceleration:magnitude_spectrum:spectral_entropy 7 watch_acceleration:spectrum:x_log_energy_band3 7 watch_acceleration:magnitude_autocorrelation:normalized_ac 7 watch_acceleration:3d:mean_x 7 watch_acceleration:3d:mean_y 7 watch_acceleration:magnitude_spectrum:log_energy_band1 7 watch_acceleration:magnitude_spectrum:log_energy_band2 7 watch_acceleration:magnitude_stats:moment3 7 watch_acceleration:magnitude_spectrum:log_energy_band0 7 watch_acceleration:magnitude_stats:time_entropy 7 watch_acceleration:magnitude_stats:value_entropy 7 watch_acceleration:magnitude_stats:percentile75 7 watch_acceleration:magnitude_stats:percentile50 7 watch_acceleration:magnitude_stats:percentile25 7 watch_acceleration:magnitude_stats:moment4 7 watch_acceleration:3d:std_x 7 watch_acceleration:magnitude_stats:std 7 watch_acceleration:magnitude_stats:mean 7 watch_acceleration:3d:mean_z 7 watch_acceleration:3d:ro_xy 7 watch_acceleration:3d:std_y 7 watch_acceleration:spectrum:y_log_energy_band4 7 watch_acceleration:relative_directions:avr_cosine_similarity_lag_range4 7 watch_acceleration:relative_directions:avr_cosine_similarity_lag_range3 7 watch_acceleration:relative_directions:avr_cosine_similarity_lag_range2 7 watch_acceleration:relative_directions:avr_cosine_similarity_lag_range1 7 watch_acceleration:relative_directions:avr_cosine_similarity_lag_range0 7 watch_acceleration:spectrum:z_log_energy_band4 7 watch_acceleration:spectrum:z_log_energy_band3 7 watch_acceleration:3d:std_z 7 watch_acceleration:spectrum:z_log_energy_band1 7 watch_acceleration:spectrum:z_log_energy_band0 7 watch_acceleration:spectrum:z_log_energy_band2 7 watch_acceleration:spectrum:y_log_energy_band3 7 watch_acceleration:spectrum:x_log_energy_band1 7 watch_acceleration:spectrum:y_log_energy_band2 7 watch_acceleration:3d:ro_yz 7 watch_acceleration:spectrum:x_log_energy_band0 7 watch_acceleration:3d:ro_xz 7 watch_acceleration:spectrum:x_log_energy_band2 7 watch_acceleration:spectrum:x_log_energy_band4 7 watch_acceleration:spectrum:y_log_energy_band0 7 watch_acceleration:spectrum:y_log_energy_band1 7 location:best_horizontal_accuracy 0 location_quick_features:mean_abs_long_deriv 0 location:diameter 0 location:log_diameter 0 location_quick_features:std_lat 0 location_quick_features:std_long 0 location_quick_features:lat_change 0 location_quick_features:long_change 0 location_quick_features:mean_abs_lat_deriv 0 audio_naive:mfcc0:mean 0 location:log_latitude_range 0 discrete:on_the_phone:is_False 0 discrete:ringer_mode:missing 0 discrete:ringer_mode:is_silent_with_vibrate 0 discrete:ringer_mode:is_silent_no_vibrate 0 discrete:ringer_mode:is_normal 0 discrete:on_the_phone:missing 0 discrete:on_the_phone:is_True 0 discrete:battery_state:missing 0 audio_naive:mfcc1:mean 0 discrete:battery_state:is_full 0 discrete:battery_state:is_charging 0 discrete:battery_state:is_discharging 0 discrete:battery_state:is_not_charging 0 discrete:battery_state:is_unplugged 0 discrete:battery_state:is_unknown 0 discrete:wifi_status:is_not_reachable 0 discrete:wifi_status:is_reachable_via_wifi 0 discrete:wifi_status:is_reachable_via_wwan 0 discrete:wifi_status:missing 0 lf_measurements:light 0 lf_measurements:pressure 0 lf_measurements:proximity_cm 0 lf_measurements:relative_humidity 0 lf_measurements:battery_level 0 lf_measurements:temperature_ambient 0 discrete:time_of_day:between0and6 0 discrete:time_of_day:between3and9 0 discrete:time_of_day:between6and12 0 discrete:time_of_day:between9and15 0 discrete:time_of_day:between12and18 0 discrete:time_of_day:between15and21 0 discrete:time_of_day:between18and24 0 discrete:battery_plugged:missing 0 discrete:battery_plugged:is_wireless 0 discrete:battery_plugged:is_usb 0 audio_naive:mfcc3:std 0 audio_naive:mfcc2:mean 0 audio_naive:mfcc3:mean 0 audio_naive:mfcc4:mean 0 audio_naive:mfcc5:mean 0 audio_naive:mfcc6:mean 0 audio_naive:mfcc7:mean 0 audio_naive:mfcc8:mean 0 audio_naive:mfcc9:mean 0 audio_naive:mfcc10:mean 0 audio_naive:mfcc11:mean 0 audio_naive:mfcc12:mean 0 audio_naive:mfcc0:std 0 audio_naive:mfcc1:std 0 audio_naive:mfcc2:std 0 audio_naive:mfcc4:std 0 discrete:battery_plugged:is_ac 0 audio_naive:mfcc5:std 0 audio_naive:mfcc6:std 0 audio_naive:mfcc7:std 0 audio_naive:mfcc8:std 0 audio_naive:mfcc9:std 0 audio_naive:mfcc10:std 0 audio_naive:mfcc11:std 0 audio_naive:mfcc12:std 0 audio_properties:max_abs_value 0 audio_properties:normalization_multiplier 0 discrete:app_state:is_active 0 discrete:app_state:is_inactive 0 discrete:app_state:is_background 0 discrete:app_state:missing 0 location:log_longitude_range 0 timestamp 0 location:num_valid_updates 0 proc_gyro:magnitude_stats:percentile25 0 raw_acc:3d:std_z 0 raw_acc:3d:ro_xy 0 raw_acc:3d:ro_xz 0 raw_acc:3d:ro_yz 0 proc_gyro:magnitude_stats:mean 0 proc_gyro:magnitude_stats:std 0 proc_gyro:magnitude_stats:moment3 0 proc_gyro:magnitude_stats:moment4 0 proc_gyro:magnitude_stats:percentile50 0 proc_gyro:magnitude_autocorrelation:period 0 proc_gyro:magnitude_stats:percentile75 0 proc_gyro:magnitude_stats:value_entropy 0 proc_gyro:magnitude_stats:time_entropy 0 proc_gyro:magnitude_spectrum:log_energy_band0 0 proc_gyro:magnitude_spectrum:log_energy_band1 0 proc_gyro:magnitude_spectrum:log_energy_band2 0 proc_gyro:magnitude_spectrum:log_energy_band3 0 proc_gyro:magnitude_spectrum:log_energy_band4 0 raw_acc:3d:std_y 0 raw_acc:3d:std_x 0 raw_acc:3d:mean_z 0 raw_acc:3d:mean_y 0 raw_acc:magnitude_stats:std 0 raw_acc:magnitude_stats:moment3 0 raw_acc:magnitude_stats:moment4 0 raw_acc:magnitude_stats:percentile25 0 raw_acc:magnitude_stats:percentile50 0 raw_acc:magnitude_stats:percentile75 0 raw_acc:magnitude_stats:value_entropy 0 raw_acc:magnitude_stats:time_entropy 0 raw_acc:magnitude_spectrum:log_energy_band0 0 raw_acc:magnitude_spectrum:log_energy_band1 0 raw_acc:magnitude_spectrum:log_energy_band2 0 raw_acc:magnitude_spectrum:log_energy_band3 0 raw_acc:magnitude_spectrum:log_energy_band4 0 raw_acc:magnitude_spectrum:spectral_entropy 0 raw_acc:magnitude_autocorrelation:period 0 raw_acc:magnitude_autocorrelation:normalized_ac 0 raw_acc:3d:mean_x 0 proc_gyro:magnitude_spectrum:spectral_entropy 0 proc_gyro:magnitude_autocorrelation:normalized_ac 0 raw_acc:magnitude_stats:mean 0 raw_magnet:3d:std_y 0 raw_magnet:magnitude_spectrum:log_energy_band4 0 raw_magnet:magnitude_spectrum:spectral_entropy 0 raw_magnet:magnitude_autocorrelation:period 0 raw_magnet:magnitude_autocorrelation:normalized_ac 0 raw_magnet:3d:mean_x 0 raw_magnet:3d:mean_y 0 raw_magnet:3d:mean_z 0 raw_magnet:3d:std_x 0 raw_magnet:3d:std_z 0 proc_gyro:3d:mean_x 0 raw_magnet:3d:ro_xy 0 raw_magnet:3d:ro_xz 0 raw_magnet:3d:ro_yz 0 raw_magnet:avr_cosine_similarity_lag_range0 0 raw_magnet:avr_cosine_similarity_lag_range1 0 raw_magnet:avr_cosine_similarity_lag_range2 0 raw_magnet:avr_cosine_similarity_lag_range3 0 raw_magnet:avr_cosine_similarity_lag_range4 0 raw_magnet:magnitude_spectrum:log_energy_band3 0 raw_magnet:magnitude_spectrum:log_energy_band2 0 raw_magnet:magnitude_spectrum:log_energy_band1 0 raw_magnet:magnitude_spectrum:log_energy_band0 0 proc_gyro:3d:mean_y 0 proc_gyro:3d:mean_z 0 proc_gyro:3d:std_x 0 proc_gyro:3d:std_y 0 proc_gyro:3d:std_z 0 proc_gyro:3d:ro_xy 0 proc_gyro:3d:ro_xz 0 proc_gyro:3d:ro_yz 0 raw_magnet:magnitude_stats:mean 0 raw_magnet:magnitude_stats:std 0 raw_magnet:magnitude_stats:moment3 0 raw_magnet:magnitude_stats:moment4 0 raw_magnet:magnitude_stats:percentile25 0 raw_magnet:magnitude_stats:percentile50 0 raw_magnet:magnitude_stats:percentile75 0 raw_magnet:magnitude_stats:value_entropy 0 raw_magnet:magnitude_stats:time_entropy 0 discrete:time_of_day:between21and3 0 dtype: int64
hierarchy = build_hierarchy(X.columns)
formatted_hierarchy = format_hierarchy(hierarchy)
print(formatted_hierarchy)
- raw_acc:
- magnitude_stats:
- mean
- std
- moment3
- moment4
- percentile25
- percentile50
- percentile75
- value_entropy
- time_entropy
- magnitude_spectrum:
- log_energy_band0
- log_energy_band1
- log_energy_band2
- log_energy_band3
- log_energy_band4
- spectral_entropy
- magnitude_autocorrelation:
- period
- normalized_ac
- 3d:
- mean_x
- mean_y
- mean_z
- std_x
- std_y
- std_z
- ro_xy
- ro_xz
- ro_yz
- proc_gyro:
- magnitude_stats:
- mean
- std
- moment3
- moment4
- percentile25
- percentile50
- percentile75
- value_entropy
- time_entropy
- magnitude_spectrum:
- log_energy_band0
- log_energy_band1
- log_energy_band2
- log_energy_band3
- log_energy_band4
- spectral_entropy
- magnitude_autocorrelation:
- period
- normalized_ac
- 3d:
- mean_x
- mean_y
- mean_z
- std_x
- std_y
- std_z
- ro_xy
- ro_xz
- ro_yz
- raw_magnet:
- magnitude_stats:
- mean
- std
- moment3
- moment4
- percentile25
- percentile50
- percentile75
- value_entropy
- time_entropy
- magnitude_spectrum:
- log_energy_band0
- log_energy_band1
- log_energy_band2
- log_energy_band3
- log_energy_band4
- spectral_entropy
- magnitude_autocorrelation:
- period
- normalized_ac
- 3d:
- mean_x
- mean_y
- mean_z
- std_x
- std_y
- std_z
- ro_xy
- ro_xz
- ro_yz
- avr_cosine_similarity_lag_range0
- avr_cosine_similarity_lag_range1
- avr_cosine_similarity_lag_range2
- avr_cosine_similarity_lag_range3
- avr_cosine_similarity_lag_range4
- location:
- num_valid_updates
- log_latitude_range
- log_longitude_range
- best_horizontal_accuracy
- diameter
- log_diameter
- location_quick_features:
- std_lat
- std_long
- lat_change
- long_change
- mean_abs_lat_deriv
- mean_abs_long_deriv
- audio_naive:
- mfcc0:
- mean
- std
- mfcc1:
- mean
- std
- mfcc2:
- mean
- std
- mfcc3:
- mean
- std
- mfcc4:
- mean
- std
- mfcc5:
- mean
- std
- mfcc6:
- mean
- std
- mfcc7:
- mean
- std
- mfcc8:
- mean
- std
- mfcc9:
- mean
- std
- mfcc10:
- mean
- std
- mfcc11:
- mean
- std
- mfcc12:
- mean
- std
- audio_properties:
- max_abs_value
- normalization_multiplier
- discrete:
- app_state:
- is_active
- is_inactive
- is_background
- missing
- battery_plugged:
- is_ac
- is_usb
- is_wireless
- missing
- battery_state:
- is_unknown
- is_unplugged
- is_not_charging
- is_discharging
- is_charging
- is_full
- missing
- on_the_phone:
- is_False
- is_True
- missing
- ringer_mode:
- is_normal
- is_silent_no_vibrate
- is_silent_with_vibrate
- missing
- wifi_status:
- is_not_reachable
- is_reachable_via_wifi
- is_reachable_via_wwan
- missing
- time_of_day:
- between0and6
- between3and9
- between6and12
- between9and15
- between12and18
- between15and21
- between18and24
- between21and3
- lf_measurements:
- battery_level
- timestamp_numeric